About Us

Tom Francis - SWSports

EVAH!!!

Via Iowahawk.

http://iowahawk.typepad.com/iowahawk...-and-urns.html

Balls and Urns

Statisticians love balls and urns. A typical Stats 101 midterm, for
example, usually includes a question along these lines:

"You take a simple random sample of 1000 balls from an urn containing
120,000,000 red and blue balls, and your sample shows 450 red balls
and 550 blue balls. Construct a 95% confidence interval for the true
proportion of blue balls in the urn."

After choking back a giggle about "blue balls," you whip out your
calculator and text your frat brother who has a copy of last
semester's midterm. He instantly recognizes the correct formula is

95% confidence interval for P = p +/- 1.96 * sqrt( p*(1-p) / n) * FPC

where P = the real, true, actual, honest-to-god proportion of blue
balls in that great big f'ing urn
p = the sample proportion of blue balls, or 0.55
n = the sample size = 1000
FPC = the "finite population correction" = sqrt((N-n)/(N-1)) where
N=120,000,000
and the 1.96 has something to do with the 95% probability area under a
standard normal distribution

That second part, after the "+/-", is what you know as the "margin of
error." Your frat brother texts you back and reminds you that since
the population is very large, the FPC is very close to 1 and can be
dropped. He also reminds you to uses the conservative estimate of p =
0.5 in the margin of error calculation, since you don't know the true
value of p, only the sample estimate. So the whole formula simplifies
to

p +/- 1.96 * sqrt( .25 / n)

=p +/- 0.98 / sqrt( n)

Assuming you still have juice in your calculator batteries and you're
not hungover from the Sig Eps kegger last night, you should get

0.55 +/- 0.031

Now you could probably say you are 95% certain the real proportion of
blue balls in that great big f'ing urn is 55%, plus or minus 3.1%. If
you wanted to get extra credit points, you should probably say that
"95% of all random samples of this size will have have a computed
confidence interval that contains the true population value." But
that's just quibbling and brown-nosing the professor, who's probably
late for a faculty meeting anyway.

This is, for all intents and purposes, how political pollsters compute
the mysterious "margin of error," which has everything to do (and only
to do) with pure mathematical sampling error. If you look at the
formula above and round it just a smidge, you get a simple rule of
thumb for the margin of error of a sampled probability:

Margin of Error = 1 / sqrt(n)

So if the sample size is 400, the margin of error is 1/20 = 5%; if the
sample size is 625 the margin of error is 1/25 = 4%; if the sample
size is 1000, it's about 3%.

Works pretty well if you're interested in hypothetical colored balls
in hypothetical giant urns, or survival rates of plants in a
controlled experiment, or defects in a batch of factory products. It
may even work well if you're interested in blind cola taste tests. But
what if the thing you are studying doesn't quite fit the balls & urns
template?

What if 40% of the balls have personally chosen to live in an urn that
you legally can't stick your hand into?

What if 50% of the balls who live in the legal urn explicitly refuse
to let you select them?

What if the balls inside the urn are constantly interacting and
talking and arguing with each other, and can decide to change their
color on a whim?

What if you have to rely on the balls to report their own color, and
some unknown number are probably lying to you?

What if you've been hired to count balls by a company who has endorsed
blue as their favorite color?

What if you have outsourced the urn-ball counting to part-time temp
balls, most of whom happen to be blue?

What if the balls inside the urn are listening to you counting out
there, and it affects whether they want to be counted, and/or which
color they want to be?

If one or more of the above statements are true, then the formula for
margin of error simplifies to

Margin of Error = Who the hell knows?

Because, in this case, so-called scientific "sampling error" is
completely meaningless, because it is utterly overwhelmed by
unmeasurable non-sampling error. Under these circumstances "margin of
error" is a fantasy, a numeric fiction masquerading as a
pseudo-scientific fact. If a poll reports it -- even if it's collected
"scientifically" -- the pollster is guilty of aggravated bull**** in
the first degree.

The moral of this midterm for all would-be pollsters: if you are
really interested in how many of us red and blue balls there are in
this great big urn, sit back and relax until Tuesday, and let us show
our true colors.

Until then, fondle your own balls.

Thread Tools	Search this Thread
Show Printable Version	Search this Thread: Advanced Search
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Newbie needs explanation of Hunter wing keel	Jim	General	8	September 8th 05 01:29 AM
A freewheeling explanation	Roger Long	Cruising	17	June 10th 05 02:39 PM
Electoral state polling fun	NOYB	General	0	October 27th 04 09:38 PM
Detailed explanation on boat engines and drive systems	winder sports	General	0	October 1st 04 08:18 AM
RDF, an explanation	Jonathan Ganz	ASA	14	March 8th 04 11:48 PM

Menu

About Us