What is Bonferroni?
- I've always wondered if Facebook cross-references the events people attend to suggest whether or not they should be friends. If this was the only metric they used to suggest friendship, how many friend suggestions would be false positives?
- How often could potential matches on large dating websites could happen by chance?
- And of course, from my field, how many single-nucleotide polymorphisms (SNPs) could appear significant in a genome-wide association study by chance alone?
- There are 1 billion active users
- Everyone attends an event 1 day in 60
- There are 500,000 registered events within our scope, which is enough to account for 1 million people who attend an event on a given day.
- We wade through 1000 days worth of event attendance records
What is the probability that two people were at the same event on two different days?
Assuming everyone randomly attends an event, the probability that someone attends an event on any given day is 0.01 (1/100). And when they do choose an event to attend, they choose one of the 2e+05 registered events at random. Just to be clear on notation,
2e+05 = 2 x 10^5.
- The probability of any two people both deciding to attend an event on the same day is 0.0001 (1/100*1/100).
- The probability that they will attend the same event is 0.0001/2e+05 (number of registered events) = 5e-10.
- The chance that they will attend the same event on two different days is ( 5e-10 )*( 5e-10 )=2.5e-19 (note that the events can be on two different days).
- The number of pairs of people is (10^9 choose 2) = 5e+17.
- The number of pairs of days is (1000 choose 2) = 5e+05.
In my own experience, I know I've seen a number of suggested friends who I have nothing in common with except similar mutual friends. Maybe friend suggestions can be improved by incorporating this information.
Of course, I deactivated my facebook in 2009 and haven't been back since. Perhaps they already leverage this information...