Bad Surveys


I'd like to state my views on survey error clearly, but I have to write quickly, so you should expect some oddities in what follows:

IMHO, we have no reason (i.e. statistical theory) to expect political polls to be consistent with one another. Most polls, although done cost effectively, are not scientific enterprises; the pollsters do not, and cannot, take the care necessary to be reliable. I do not accept as valid their "margins of error" at any given level of confidence. Those numbers--48% this or that with a +/-3% margin of error--simply do not mean what survey authors suggest.

In Stephen Landsburg's recent post, he excluded "all of the potential problems with surveys other than sampling error". Well, in my view, we can't exclude nonsampling error; nonsampling error actually dominates sampling error. Our focus on sampling variance is understandable, since that variance is what is seen. But it is the unseen that is of critical importance.

In polls, and in the recent hideous Iraq death toll survey, unmeasurable nonsampling error confounds any attempt to take the point estimate as "best"; reported sampling error bounds are NOT total error estimates. To get survey statistics accurate within precise and quantifiable confidence limits, you must conduct all aspects of survey procedure in exact accordance with the statistical theory used to structure the survey. That is, you must make sure nonsampling error does not enter the analysis in the first place. Some examples:

1) The frame (a master list of units to be surveyed) must include everyone in the relevant population.
2) The sample (the acutal units chosen to be surveyed) taken from the frame must be done according to a predetermined but random method.
3) All units chosen in the sample must respond.
4) All questions must be clear, and all responses must be accurate.
5) All computations are performed without error.
6) All results are printed without error

Any and all deviations from 1-6 will create unquantifiable errors, making the reported numbers and error margins larger or smaller than what their values would have been if the survey were conducted in exact accordance with statistical theory.

Assume 5 and 6 hold, and let's rename 1-4 in conventional sampling terms:

1b) Frame error - Every time somebody is not in the frame, he has no chance of being selected, increasing the relative liklihood that some are chosen over others. This biases estimates, because those who are selected have their responses improperly weighted.

2b) Nonprobability sample - Selecting units nonrandomly means using a rule of judgement to select them; there is no reason to believe that such a selection method is impartial. This unknowable bias can have dramatic effects on estimates.

3b) Nonresponse error - Some people don't respond. People who don't respond are likely to be different than people who do respond in ways that concern the analysis. Picking a substitute from within the respondents based on similar characteristics, or imputing responses based on previous answers, might lessen bias, but cannot eliminate it.

4b) Response error - Some people a) lie, b) forget, c) didn't have an opinion until asked, or d) try to appease the interviewer. .

Economists tend to group 1b-4b in a mass called "measurement error", as if their measuring scale were always "off" by a given amount. I think this is a terrible mistake on the part of economists, although I understand the need to simplify.

Unless 1b-4b are ignorable (i.e. made very, very small) which, as far as I know, HAS NEVER OCCURRED IN SOCIAL RESEARCH, a statistician can honestly state that his personal "best" estimate of Z (the population parameter) is X (the sample estimate), but he cannot honestly state that Z is between X-x and X+x with (1-a)% confidence (where x is the standard error).

For instance, do we really believe the Iraq study authors when they write:

We estimate there were 98,000 extra deaths (95% CI 8000-194,000) during the post-war period.
That 95% figure lets the authors calculate the sampling variability in the absence of nonsampling variability, but it does not give us any reason to believe that nonsampling errors are ignorable. Who knows if the actual number of dead is represented by the distribution given, since nonresponse was rampant, replacements were arbitarily chosen, and respondents could give unverifiable information? This violates 2,3, and 4.

Every time 1, 2, 3, or 4 is violated, X and x move in unpredictable ways. This creates variances and biases that shift and skew the reported sampling interval in unknown ways. These are biases and variances that we have no reason to believe will cancel out. Pollsters with a terrible frame and 20% response rates are not as bad, but they give me no reason to believe their results, as a matter of statistical practice.

This is not to say that good survey research is impossible. Under controlled conditions, it's possible to make nonsampling errors ignorable. But such errors always affect even the best of social statistics, which opinion polls aren't.


Thank you.

That is the down side of surveys that some other people who are in charge with regards to those things that would really be important for people concerned. - Lindsay Rosenwald

Leave a comment


Powered by Movable Type 5.02

About this Entry

This page contains a single entry by Kevin published on October 30, 2004 4:54 PM.

BLIS was the previous entry in this blog.

Rojas on Bowling Alone is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.