Posts Tagged ‘statistics’

The Statistics of Sex Tapes

Monday, April 26th, 2010

How long would it take to watch all the porn in the world? Is it even possible? This is what my friends and I were wondering the other day at the pub. Being a bunch of math students, we whipped out a pocket calculator to find out. Here’s what we came up with.

To produce a really conservative estimate, let’s restrict ourselves only to amateur sex tapes produced by couples in the United States. How many are there? How long would it take to watch them all if it was your full-time job? There are currently 307 million people in the United States. Suppose that two-thirds of people will at some point be part of a couple. That’s 102 million couples. Suppose that only 5% of these couples will ever produce a sex tape. That’s 5 million sex tapes produced by people who are currently living in the US. Given that the average life expectancy is 78 years, we can err on the safe side by assuming that about 1.3% of those sex tapes will be made this year. Add that all up, and we can safely assume that 65,598 sex tapes will be made in the US this year.

By taking a straw poll, we guessed that the average sex tape would be 10 minutes long after editing, but let’s call it 5 just to be sure. Therefore 327,991 minutes of sex tape will be produced this year. If you unrolled that much VHS tape, it would stretch between Toronto and Montreal. If you put all that porn onto DVDs, it would make a stack 13 stories high. So, is it feasible to watch that much porn?

If watching porn was your full-time job (40 hours per week, all year round, no holidays), you would only be able to watch 38% of the amateur sex tapes made this year in the United States alone. The more you watch, the further behind you fall. I conclude based on this analysis that it is absolutely impossible to watch all the porn in the world.

The Collatz Conjecture

Friday, February 19th, 2010

I have been fascinated by the Collatz conjecture for years. It’s a math problem that is so simple to understand, yet no mathematician has managed to solve it. Since the problem was first proposed by Lothar Collatz in 1937, many mathematicians have gone crazy trying to solve it. Here’s how it works.

The Collatz function takes one number, denoted by n, and turns it into another number. If n is even, the result is n/2. If n is odd, the result is 3n+1. For example, if n is 3 then the result is 10. If n is 4 then the result is 2. If n is 5 then the result is 16. You get the idea.

A Collatz sequence is formed by starting with a number, and repeatedly applying the Collatz function to extend the sequence. For example, the Collatz sequence for n=3 is 3-10-5-16-8-4-2-1. The Collatz sequence for n=5 is 5-16-8-4-2-1. The Collatz sequence for n=2 is just 2-1. Here is a picture of the sequences for n=7 and n=19.

After looking at some of these sequences, you may start to notice a pattern. No matter how high they go, they always seem to come back down to 1. Does every Collatz sequence always come back down to 1? Perhaps some of them get into a loop and keep going round forever, or perhaps some of them just keep going up and up towards infinity. This is the Collatz conjecture: prove that every Collatz sequence eventually comes back down to 1.

Another way to picture the problem is using total stopping times. The total stopping time of a number is the number of steps it takes for the number’s sequence to reach 1. The Collatz conjecture states that every number has a finite stopping time. Despite being such a simple problem, and being open for almost 75 years, nobody has managed to prove or disprove the Collatz conjecture.

There are a couple really tantalizing patterns in the total stopping times. Let me show you just one. Have a look at the first 1000 stopping times.

The first 100 total stopping times. The horizontal axis is n, and the vertical axis is the total stopping time of n.

Notice any patterns? It looks like some of the points are bunching up into short horizontal lines. Furthermore, these short horizontal lines seem to line up in large sweeping curves. To investigate these patterns further, I wrote a short C++ program to draw the same chart, except for the first ONE BILLION stopping times. This time, the x-axis is logarithmic.

The first one-billion total stopping times. The horizontal axis is n, and the vertical axis is the total stopping time of n.

Whoa, now that’s a pattern! Look at all those nice straight lines. Check out a closeup of the bottom-right part of that image.

Close-up of the patterns in the total stopping times

What is causing all those straight lines? Why are they all the same length? The most interesting question to me is, why is there always a long “dash” followed by a short “dot”? If we can explain the structure of these regular patterns, can we construct an exact probability distribution for the total stopping time of any number? Could this distribution be the key to finally proving or disproving the Collatz conjecture? If you want to do a bit of work on the Collatz conjecture, answering these questions might be a good place to start.

Statistical Properties of Alphaghetti

Saturday, January 23rd, 2010

The other day I was enjoying one of my favorite snacks, Alphaghetti, and I began to ask myself a few questions.

How many letters are in a can of Alphaghetti?  Are all 26 letters included, or are there some that are missing?  Is there a roughly equal number of each letter, or are some letters consistently more common than others?  To answer these questions, I decided to do a scientifically rigorous analysis of the statistical properties of Alphaghetti.

The first step is to collect data.  I grabbed a can of Alphaghetti, and emptied it into a bowl.  I began picking the letters out one by one, keeping a tally of all 26 letters.  I enlisted the help of my roommate John.

Jeff and John tallying noodles from a can of Alphaghetti

It took about two hours to count all the letters.  There were 853 letters in the can.  The data is here.  There were no letters missing.  The most common letter was “i”, but it was not so far ahead that it couldn’t be fluke.

Jeff tallying noodles

I ran some chi-squared tests to get a better idea of the frequency distribution amongst the letters.  I discovered that there is no way the letters are distributed as they are in a large sample of English text.  Another chi-squared test revealed that it’s not totally implausible that the letters are uniformly distributed.  If I had to guess based on this evidence, I would say that the letters in a can of Alphaghetti are uniformly distributed, although this guess is not statistically justified.

One fact that surprised me was the surprisingly low occurrence of defects in the letters.  I would expect that letters like M, W, and Z would break often, but they didn’t.  I only found 30 noodle fragments in the whole can which could not be clearly identified as letters.