But what if your baby really is ugly?


There's a debate raging over television that I personally find fascinating. Aside from the fact that it could well impact the shows I get to watch (Alias, I already miss ya), the debate demonstrates why statistics, despite it being third behind "lies" and "damn lies" in the ranking of upsetting ways to make an argument, really are important.

Nielsen is planning on expanding the use of its PeopleMeter ratings system. The machines record all the activity a television undergoes while someone is watching it. No longer will "Nielsen Families" have to fill out little diaries with a list of shows watched for the week; now the machine records every flip between that guy that yells while cooking, those home decorating shows taking over the upper levels of my cable lineup, and an episode of JAG. Sounds like it would a great step forward for TV ratings, right?

Well, as you can tell from the article, not everyone is happy:

Nielsen's explanation: Fewer people are watching those shows than the diaries showed.

"They don't accept that," said Nielsen spokesman Jack Loftus. "You can't tell me my baby's ugly."

Of course, those stations whose shows are now lower rated are carping, but that's to be expected. The hottest under the collar, however, are minority groups. Why, you might ask? Potential under-representation, I would respond.

However, it's a little more nuanced than you might think. The problem, apparently, isn't that the new system undercounts Latinos, blacks, Asias, or anyone else, really. The real issue, according to the spokesman from this story (audio news -- I thought it would be fun to throw in a curveball) is that the within-minority sample isn't appropriately representative. By this they mean that the people targeted as, say, "Asian viewers" don't accurately represent the population. The sample is off, they contend, on the characteristics. The average Asian in the population is more or less wealthy, spends more or less time watching television, is more or less likely to speak English, than the average Asian in the sample.

This could well be true, of course. Samples are almost always off by some factor, if only because of measurement error. More likely, however, is that there are distinct selection bias issues at play. The Nielsen system is opt-in, not opt-out, meaning that in order to get a group of people to participate, they have to have chosen to do so. They select into the group, in other words. Bias -- or a difference between the values for various characteristics in the sample and the "true" values in the population -- arises because those factors that inspire someone to agree to be part of the system are functions of some characteristic that may be systematically correlated with those things that make them different from a population average. Television watching is a leisure activity. Someone who has more time to spend in front of a telelvision either can afford the time, or has nothing else to do. Perhaps those people who watch more television have more free time since the time not spent working isn't as valuable, in which case we'd expect a bias towards lower income in the participation group. Or perhaps those people who agree to participate enjoy TV because they can afford a larger set, satellite television, or more TVs in the house; we would expect in that case a bias towards higher income in the participation group. The direction of the bias (higher or lower than the population average) need not go one way in particular...the deviation itself is important here. Education might produce similar problems. More educated families might prefer to read, watch movies, or play games while less educated families might opt for more television viewing.

But there's something else I bet you're now thinking (the two of you that came this far, that is): don't lower income families have less of a chance to read, go out, or whatever, because more of them might be working and having to work longer hours? And might not more educated minority families have a greater number of English speakers, meaning that they can more fully enjoy the programming on TV? Certainly! The covariance between these characteristics is important, since they so obviously directly affect each other. The problem is, this means a potential for even more bias in the sample.

Through no particular effort of the Nielsen group, the samples they get are necessarily going to be different than the population averages. Attempts to correct for these issues while sampling can get prohibitively expensive. Perhaps the critics, then, should be more focused on just how biased the samples are. If the average in the sample is a Latino family of 4, combined income of $100,000, with all fluent english speakers, while the population average is a family of 7, single income household of $24,000 and only a couple speakers of fluency, then there might be strong reasons to change the sampling methods.

This does beg one other question, however: what’s really wrong with the sample the Nielsens use? If there is a correlation between the kind of person who participates, and the kind of person who watches more TV, might this not be an appropriate sampling to use? After all, if you force your sample to look more like the general population, you might well be biasing your results in another direction. Under such a method, you end up grouping people who have no real prediliction to watch television with those who do. In effect, you’re erasing the connection between the rating system and the amount of viewing time. (Of course, this might be desirable in its own right.) What good does it do for advertisers to look at rating numbers that take into account those people who rarely ever watch TV?

The issue here is of defining the appropriate population from which to sample. In the best case scenario, the people who become Neilsen families ought to be those who are randomly selected from among a population of people who want to be one.

To my mind, that should be the simple defense of the new PeopleMeters. If designed correctly, the system measures the television watching habits in a sample of people who most accurately represent the population of people who, on average, tend to watch a certain amount of television. After all, the ratings that come out daily, weekly, and potentially now hourly, are best read as “The share of the population watching television at this given time, was X for show Z” not “The share of the general population at this given time, was X for show Z.”

Of course, that’s a hard pill to swallow for those who see their show ratings going down. The reduction of measurement error has revealed that, perhaps, they weren’t as popular as they once thought. If the sample is representative of the television watching population, however, the best explanation isn’t that the sample is wrong -- it’s that your baby just might be ugly.

(N.B.: Here’s a link to the Nielsen site, though it is particularly unhelpful on talking about their sampling methods. Here’s an article from the always-entertaining Straight Dope column on the measurement of Nielsen ratings.)


Poltrack [VP Planning at UPN] acknowledged that the meters may be more accurate and the affected shows may simply be unpopular.

"It certainly is possible," he said.

Actually, it seems pretty likely--not just possible--that Nielsen's sampling procedure is no less accurate than before, but that automatic recording of shows watched yields tremendously greater reliability than diaries.

Also, I gather that the defense of the new system would be done by those whose shows have higher ratings, but for the fear that they might be perceived as racist? If such a fear of being labeled racist exists, wouldn't journalists not be accurately or adequately sampling the range of opinion on this matter?


However, I think it might better be thought of that journalists are getting an accurate sample of those who are afraid of being labeled racist, but are possibly not displaying an accurate sample of the general opinion of those who have shows under review. Again, it's a question of the population under consideration. In this case, we'd be more interested in the opinion of the population of all network execs, rather than of the group of execs who fear the labelling.

I can't top your reply, but I still want somebody in the business to say, "Maybe the ratings were just wrong before," or in the very least "you know, my show, like, doesn't really suck as much as they said."


Powered by Movable Type 5.02

About this Entry

This page contains a single entry by published on June 7, 2004 8:08 PM.

More Credit Rating Agency Issues was the previous entry in this blog.

San Francisco Real Estate Blog is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.