Truck and Barter: Data Archives

July 13, 2005

Are bigger planes more deadly?

By Ian

I have no idea, really, but in the course of some work I ran across the following chart and was instantly fascinated:

Airline Accident Trends 1945-2004

(Source: PDF) Note that these numbers exclude non-accident occurances such as bombing and hijacking.

The left hand ordinate is the number of fatalities, and the right-hand ordinate is the number of accidents. While the trend indicates that the number of accidents has been on quite a descent, what I find more interesting is that the ratio of accidents:fatalities converges, but doesn't (on average) reverse, 1985 and 1997. After an increase in the number of fatalities through to the mid 70s, the number has returned to numbers just below the 40s. But the number of accidents is almost a third of what it was during the same time.

Not being overly familiar with the aviation industry, the only trends I'm familiar with are the growth in plane size, and the alternating shift to, and now more recently from, hub-based route architecture. With larger planes, each accident will claim more lives. (Which is similar to my retort about people who talk incessantly about flying being safer than driving -- I've spent a good portion of my life on a plane and have never had a fear of flying, but when you have a car crash, you don't often lose 280 people at once. In the event of an accident, I'd much rather take my chances in an automobile. I don't think I have the luck to come out this well.) Any ideas?

Posted at 03:39 PM | Comments (6) | TrackBack

June 30, 2005

What is our government reading?

By Ian

The products of the Congressional Research Service are being made public through the efforts of the Center for Democracy & Technology. They can be found at OpenCRS. Note that some reports can also be found at the site for the National Council for Science and the Environment (I don't know if the two are collaborating on concatenating the collections in any way or not).

Always interesting to see what kinds of information is being sent to our legislators...

Posted at 09:34 AM | Comments (0) | TrackBack

June 24, 2005

MIT Blog Survey

By Kevin

Go take the MIT Blog Survey. I did, and found out that I post much more frequently than many others.

Note that I did have to tinker with the banner so that it would fit my statistical philosophy.

Posted at 02:51 PM | Comments (0) | TrackBack

June 02, 2005

When Statisticians Laugh

By Kevin

So the CDC is sent into WV to analyze an outbreak of obesity. The author of the story goes through the routine, but then asks two statisticians what they think:

Dr. Daniel McGee, a professor of statistics at Florida State University who has analyzed obesity data, burst out laughing when he heard about it. "My God, what a strange thing to do," he said.

"They'll find out what we all know - that the country is no longer set up for physical exercise," Dr. McGee said. And that schoolchildren "don't get a nutritious diet." And that "there is a lot of high-fat food on the shelves of every supermarket."

But, he said, "that doesn't tell you much."

"I'm sure skinny people go to those same restaurants," Dr. McGee said. "Skinny kids go to those same schools."

Dr. David DeMets, a professor of biostatistics at the University of Wisconsin, was also extremely skeptical.

"We get a lot of false positives from that kind of investigation," Dr. DeMets said. "We get people worried," but there is no way to know whether what is found... has anything to do with the obesity epidemic.

"Perhaps it is true, perhaps it is not," Dr. De Mets said.

Priceless.

Posted at 09:28 PM | Comments (0) | TrackBack

May 26, 2005

Should GDP Revisions Make a Difference?

By Kevin

Once upon a time, a small revision in the GDP growth rate led to a very different newspaper storyline. Just remember that this is an all-too-real fairy-tale...

One month ago, on April 29, I defended my dissertation about error in GDP. That day the headlines were screaming that the economy was in the tank. The quarterly annualized growth rate of Q1 GDP had come in at 3.1%. Jeannine Aversa of the Associated Press wrote:

[T]he economy grew at an annual rate of just 3.1 percent in the first quarter. The slowest pace of expansion since in two years was evidence of a new "soft patch."

The first-quarter's GDP figure, down from a 3.8 percent pace logged in the final quarter of 2004, represents the economy's most sluggish showing since the first quarter of 2003, when economic activity expanded at an even more mediocre 1.9 percent rate...

The newest snapshot of the economy disappointed economists. Before the report's release, they were forecasting a 3.5 percent growth rate for the first quarter...

["Soft Patch" is] the term Federal Reserve Chairman Alan Greenspan used last spring when economic growth slowed abruptly.

Today, Q1 GDP was revised upward to 3.5%, and Jeannine Aversa has found a new story:

"The 3.5 percent pace is really a safe and solid pace for the economy to grow. By that I mean, it is not so fast that you can have an inflationary accident and not too slow to create new jobs," said Stuart Hoffman, chief economist at PNC Financial Services Group. "It is right on the economy's speed limit."

Folks, 3.1% and 3.5% are economically and statistically indistinguishable. If you don't believe me, ask somebody from the BEA. There is almost no difference in the data, yet the story goes from "soft patch" to "safe and solid". This is not reporting reality; this is from the land of make-believe. Quite simply, GDP data aren't able to tell coherent stories with differences this small, especially when comparing two recent quarters.

Here's more:

The new reading is close to the 3.6 percent growth rate that economists were forecasting before the release of the GDP report.

The 3.5 percent pace clocked in the first quarter of this year _ while better than an initial calculation for the quarter _ still represented some slowing from the 3.8 percent pace seen in the final quarter of 2004.

Economists initially predicted 3.5%, and it came in at 3.1%. The next month, they predict 3.6% and it comes in at 3.5%, and this is slightly lower than the 2004Q4 of 3.8%. A few points are worth noting. First of all, these are very good predictions. Second of all, 2004Q4 growth itself was revised from an "advance" 4.0% to a "preliminary" 3.1% to the "final" 3.8%. Do you really think stories about short-term changes from 4% to 3% growth have any basis in reality? Third, what can stories based on these data possibly mean? Nothing. Absolutely nothing.

I cannot see how anybody's planning decisions have changed because of this revision. As a subjective rule, I cannot see how any revision smaller than +/-1.5% or +/-2% could affect public or private policies at all.

Posted at 10:16 AM | Comments (3) | TrackBack

May 25, 2005

Where Do You Talk Like?

By Ian

Those of you who click through any number of more "personal" blogs will recognize the concept: a display of results from some internet "survey" or another to figure out which of several "things" (Star Wars character, literary heroine, car, etc.) to which you might be most similar. Along those lines I ran across one (I forget now on which I blog I first saw it) that gave an aswer as to the kinds of English one spoke, giving percentage breakdown of Northern, Southern, Midwestern, Upper Midwestern, and more. Little did I know that this survey was being followed a little more closely than the one that told me that if I were an X-Man I would be Cyclops. (Clearly erroneous. As anyone who knows me would say, I would be a perfect Havok.)

From that survey the results were compiled and then displayed in numerous maps of the US, showing geographic concentrations of response types: Dialect Survey Maps and Results.

One thing I'm grateful for is that this survey finally recognized and demonstrated something I'd noticed while living in Ohio three different times--a very odd use of the word "anymore" that was hard to reconstruct for examples. Here are the results for whether or not people consider either the phrase "I do exclusively figurative paintings anymore", or the phrase "He used to nap on the couch, but he sprawls out in that new lounge chair anymore" to be gramatically correct. (I do not.) Notice the "Eastern Midwest" concentration of those who responded "acceptable". Pennsylvania, Ohio, and Southeast Michigan all seem to consider this linguistically ~~hunkey-dorey~~ hunky-dory (my goodness, what was I thinking?). But then, what can you expect from an area that drops its helping verbs? (To wit, from my Pennsylvanian father: "Why are you just sitting on the couch? The lawn needs mowed and your hair needs cut.")

Had I the time and the data I'd map these results against the spread of ethnic groups over time and the level of economic development in each area. That, and a good GIS program would help. As my grandmother used to say, "If I had some cheese I could have a ham and cheese sandwhich if I had some ham."

Posted at 01:36 PM | Comments (0) | TrackBack

Lawmakers to DoD: Give Us the Data We Need to Refute You

By Kevin

I have returned to RAND, so I cannot comment on the substance of the charges that, essentially, DoD wants to close the wrong bases.

WASHINGTON - The entire Missouri delegation to Congress demanded Tuesday in a letter and a press briefing that Defense Secretary Donald Rumsfeld release information explaining his decisions to close dozens of military installations around the country, including the 131st Air National Guard Fighter Wing at Lambert Field....

Pentagon officials said they had posted a large amount of information since the weekend on their base closing Web site and also provided the data to the Base Realignment and Closure Commission....

Spokesmen... said that what's missing are "the underlying empirical data justifying the scores" given military installations, as well as cost-benefit analysis for closing specific bases.

Meanwhile, Illinois Gov. Rod Blagojevich asked that the Pentagon "immediately make all the data materials and computer models ... available for public review."

Let's assume the goal of everyone is to maximize defense capability within a reasonable budget constraint. (Stop laughing). Missouri officials believe that their bases are very important, and add much inframarginal and marginal value to national security, but they don't have independent data to prove it.

Then how exactly did they determine their prior belief that their bases are important? Why can't the public review their data and models and cost-benefit analyses?

-----

I'm getting tired of reading news stories in which X accuses Y of Z, Y denies Z, and the journalist makes little effort to find out if Z is true, or whether X or Y have solid reasoning and evidence behind them, or a huge personal and financial stake in the matter.

Posted at 09:03 AM | Comments (1) | TrackBack

May 16, 2005

Disintermediation in Medical Information

By Ian

I tend to like pharmaceutical advertising because I have a preference for more information rather than less (even if it is filtered through the lens of the seller). No, it's not an unadulterated good in all situations, but if I'm uncertain about the conditions under which I'm participating, I'll choose to believe that more information could be a help, especially in relation to medicine. All of which makes me appreciate the new patientINFORM service.

Under the patientINFORM web-based pilot project, when patients, their caregivers, or others visit the voluntary health organization websites with general questions and to read news stories and other web content created by the organizations to help interpret the latest research, they will also have the option of being connected directly to the source through links to free full text of the research articles on the journal websites. Healthcare consumers will be able to access selected journal articles as soon as they are published.

This comes via Slashdot, which has a host of interesting links to a number of places bringing more information to the masses.

Posted at 02:07 PM | Comments (0) | TrackBack

April 14, 2005

Interesting Resource

By Ian

Stumbled across this the other day doing some research: The Directory of Open Access Journals. It lists and links to those journals that offer -- surprise, surprise -- open access to their content.

I'm not convinced this is the best business model for academic journals, but that may be largely beside the point. Of specific interest to this audience, here's the link to Econ journals.

Posted at 09:18 AM | Comments (1) | TrackBack

April 11, 2005

I'm not a Certified Economic Accountant, but I Play One on My Blog

By Kevin

Don Boudreaux insists that the style and attractiveness of modern airports are not included in GDP. This is a debatable point. 30 years ago airports were no-frills affairs, but now have shopping malls, restaurants, cleaner facilities, etc. inside.

Let me answer with another question: Climate controlled luxury shopping malls (some provide free wi-fi) have increased the comfort, pleasure, and style of shopping. Is the increase in their attractiveness accounted for by GDP?

Answer: None of thse "public space" issues are well-reflected in real GDP, although the cost of building them shows up properly in nominal GDP.

You might point out that taxpayers and consumers pay for these amenities, and this should be reflected in GDP. Then I'd reply that just because something is paid for (in nominal GDP) doesn't mean it's properly accounted for in real terms.

If prices increase because airports are masterpieces of design, then the "real" price paid for using an airport might very well be less, because you're now using the building as an airport and as a work of art. The marginally higher price is paying for an additional service: art.

In other words, Grand Central Station is more than a train station; at least, it is to me. Real GDP doesn't reflect this. I doubt that nicer spaces like Grand Central are accounted for in real GDP, as "niceness", like "good art", is not measurable. But I don't think real GDP can or should reflect my view of art. That's not what real GDP is for.

(Btw, Penn Station isn't more than a train station).

In Don's airport case, we have commerce coming to the rescue of dismal airports. Inasmuch as this commerce is done inside of domestic airports, final sales of goods sold there are included in nominal GDP, and the wages of the employees working there are in nominal GDI.

Airport stores and restaurants are not treated any differently than non-airport stores. It doesn't matter to the BEA whether Don purchases a Brooks Brothers suit inside of Reagan airport or inside of a local shopping mall. (I don't know how international airports' "duty-free" shops are handled, though I think no differently). If the BLS adjusts for quality change in the goods sold at these stores nationwide, then that quality change is reflected in real GDP.

Hence, the real question is whether the prices (and taxes) paid for airline tickets and goods in the airport somehow account, in real terms, for the nicer amenities at airports.

Prices serve to co-ordinate activities; they do not serve the purpose of measuring quality-adjusted value to the end consumer. They can be used for the latter only under a narrow set of circumstances. Simply put, the market isn't providing prices to the data agencies that adjust for quality change of public spaces. That's not the job of prices, nor of markets; it's the job of the agencies. And right now they don't do it (it's NOT easy)... I know of no way to verify this conjecture for you other than asking BLS and BEA statisticians if they are manually adjusting for nicer shopping spaces...

In fact, government services, like airports, are valued at COST, and there are horrendous difficulties in measuring the productivity of government enterprises. That airports have become joined with shopping malls makes things interesting.

Back to my question above, if you think nicer shopping malls are not reflected in real GDP, and modern airports have just integrated shopping malls, you might think that nicer airports have not been included in real GDP. If nicer airports have come about through government spending, then nicer amenities haven't been accounted for, since only the small imputed productivity increases in airport building would have marked down the cost of prettying them up.

(Note that V. Postrel recently noted the difficulty of filtering hotel room quality change from price change; Don's question is, in fact harder, since airport amenities are not directly charged for.)

Sex and GDP

In my view, the use of aggregate economic data had already gotten out of hand by the 1950's. We have to get back to basics, and understand what the data are actually trying to measure.

What does GDP purport to measure? NOT WELFARE and NOT LIVING STANDARDS. Get over it, folks! As much as some would like it to be, GDP is not a people-experience counter. GDP is ONE account of the value of goods and services produced and exchanged via markets and governments. It does not put a value on most intrafamily exchange and production, although farmers growing food for themselves do have the value imputed. The standard example of non-counting is housekeeping; if it ain't paid for, it ain't counted. If it is paid for, it is. Quality change is not really much of an issue for housekeeping.

But to see in greater detail how arbitrarily some things are counted and others aren't, how about a more controversial example: sex. If it ain't paid for, it ain't counted. In fact, because it is an illegal activity, even paid sex is not included in the U.S. GDP. (Moving to the System of Natonal Accounts will change this, and I'd like to see how the BLS will get its prices!) When counted, I have no personal or academic knowledge of whether quality change will be a "problem" with prostitution.

Don't think paid sex is an important service? Some say sex, drugs, and smuggling would increase Swedish GDP by only 0.2%. Others have noted a much larger figure for elsewhere, "According to Marilyn Waring, the sex industry accounted for about 14 per cent of the GDP of Indonesia, Malaysia, the Philippines, and Thailand in 1998". This puts economic growth in a new perspective, no? Prostitution could also be about 2% of GDP for those countries, it depends who you ask. Poland and the UK both estimate about 0.2% of GDP is prostitution. I get the feeling that 0.2% is just an assumed nuisance value, like "50% unemployment in Iraq".

I ain't no Expert

Please note that Don is on my dissertation committee, so he knows that I cannot pass up a chance to discuss error in macroeconomic data.

First of all, no economist can truly be an expert in GDP; the economy is too massive and complex for one person to understand how to make GDP. This is for the same reasons, but with far greater potency, that nobody knows how to make a pencil.

I'll grant that you can understand the outline of all the processes of national accounting. But no man can know what to do with all the prices, quantities, formulae, regressions, guesses, estimates, and the like for a 300 million person strong, complex flux of growth and decay. How do I know this? Because I tried! How they are all these elements put together? How wide in scope is each available data series? What means are used to measure the different goods and services? How do we adjust for the birth and death of firms? How do the scope and quality of estimates differ from last quarter, last year, last decade? What is missing from these estimates?

The matter of economic accounting is so complex that someone just proposed the idea of a "Certified Economic Accountant". Really:

Hence we advocate a Certified Economic Accountant (CEA) degree or diploma program to gain enhanced recognition and greater understanding for national economic accountants and their work.

Posted at 01:44 PM | Comments (2) | TrackBack

March 29, 2005

Yao is Irrelevant

By Kevin

Don Boudreaux links to Bryan Caplan's clear explanation of the danger of misinterpreting averages, and writes about an example he uses in Econ 101:

I use average height to explain to my students the problem with taking averages at face value. Suppose the average height of my class of 200 students is calculated and turns out to be 5’8”. Then let Yao Ming walk into the classroom. Because he is 7’6” tall, he will increase the average height of people in the classroom – but do nothing to the heights of any individual in the classroom.

The logic makes sense to me, and is a good point to make, but adding one person with an extreme attribute to a large group will usually have little effect on the resulting mean value.

I made that point when measuring the average hourly pay of Wal-Mart workers. Adding in the $10 million salary of WM's CEO H. Lee Scott increases the hourly wages of a million Wal-Mart employees by about half a cent an hour. This is irrelvant for almost all purposes. As I wrote, the median and the mean are close enough for all but nit-picking.

I'll make the same point with adding Yao to Econ 101. 7’6” Yao Ming will raise the mean height of Don's 5’8” 200 student class by approximately .11 inches. The new mean is 5’8.1’’. All this means is that whether or not Yao is added is irrelevant for almost all purposes of measurement -- but is extremely important for fielding a basketball team from Don's students.

(Here's the arithmetic: 200 students at 5'8'' yields 13600 total inches. Adding in 7'6'' Yao yields 13690 inches. Dividing by 201 yields 68.11 inches on average -- or 5'8.11'')

Posted at 11:02 AM | Comments (0) | TrackBack

March 14, 2005

It's that time of year again...

By Ian

For those of you who are into the whole b-ball thing this might be of interest: the Dancecard Rankings from Jay Coleman and Allen Lynch (the link for Allen Lynch on the page is his email -- not using it because I don't think we need to increase his chances of receiving spam).

I'm more of a football and lacrosse guy myself, but from this perspective, I can get into just about any sport. I know baseball is usually the sport that attracts people fascinated with data work, but I think that area's pretty well covered. Anyone with good data sets for college lax and can send me a file/link is up for a beer or several if you're ever in the DC area....

Posted at 01:28 PM | Comments (0) | TrackBack

March 04, 2005

my incompetence with Movable Type

By William

I got confused about how Movable Type works, forgetting that "save" by default means "publish". Thus I have now published two copies of an early draft (broken HTML and all) of something I wasn't sure that I wanted to publish anyway.

Argh.

original version: I have spent twenty minutes or so trying to figure out how to make them go away, with no success so far except deleting the "entries" and a correct (I hope) understanding of what happened (and that a published article is evidently a separate copy of the "entry" it was made from). If someone wants to take mercy on me, please feel free to delete the published articles. Meanwhile, I'll probably continue trying on my own for a while.

updated and hopefully-final version: They seem to be gone now, yay. Perhaps someone took mercy on me (but didn't send an email?) or perhaps they were in fact deleted when I deleted the entries, and I was having some sort of (cache?) problem which kept me from seeing the change on the main Truck and Barter page.

Posted at 08:59 PM | Comments (1) | TrackBack

February 15, 2005

More On Randomness

By Ian

Ehhh...I would say "Deep in the basement..." is about as good a starting line for an article about science as "It was a dark and stormy night..." is for horror.

Nonetheless, this was at least an entertaining article about a number of black boxes generating random numbers that some claim to predict some not-so-random events:

The machine apparently sensed the September 11 attacks on the World Trade Centre four hours before they happened - but in the fevered mood of conspiracy theories of the time, the claims were swiftly knocked back by sceptics. But last December, it also appeared to forewarn of the Asian tsunami just before the deep sea earthquake that precipitated the epic tragedy.
Now, even the doubters are acknowledging that here is a small box with apparently inexplicable powers.

'It's Earth-shattering stuff,' says Dr Roger Nelson, emeritus researcher at Princeton University in the United States, who is heading the research project behind the 'black box' phenomenon.

And, for balance, here's something from a dissenting opinion:

September 11th: A study in wishful thinking.
It was obvious that the terror attacks of that day should make a pretty good case for Global Consciousness (GC). On the surface, it did. There seemed to be a very pronounced effect on that day and in the time right after.

There were, however, several problems. The most obvious was that the changes began at 6:40am ET, when the attacks hadn't started yet. It can of course be argued when the attacks "started", but if the theory is based on a lot of people "focusing" on the same thing, the theory falls flat - at 6:40am, only the attackers knew about the upcoming event. Not even the CIA knew. Hardly enough to justify a "global" consciousness.

Perhaps this is an uneducation question, but wouldn't 30 years of continually generating random numbers result in plenty of oddly large/sustained deviations away from the expected 50/50 distribution of 1s and 0s?

Posted at 03:22 PM | Comments (3) | TrackBack

January 31, 2005

More Reason to Like the WSJ Online: The Numbers Guy

By Ian

I've been enjoying this series, and forgot to mention it to T&B readers.

The Numbers Guy, over at the Wall Street Journal.

Here's just a quote from the top page I think Kevin, given his recent focus on errors in data, might appreciate:

Jan. 28, 2005 Some 63,135 cellphones were abandoned in the backseats of London taxis over the last six months, according to a quirky survey that made headlines recently. The precision of that number should be your first clue something's amiss.

Posted at 04:26 PM | Comments (1) | TrackBack

January 25, 2005

Quick Book Review: The First 40 Pages of The Secrets of Economic Indicators by Bernard Baumohl

By Kevin

After two hours of reading The Secrets of Economic Indicators, I must regrettably write an excoriating and punishing initial review of this book. But before my tone and goals are misunderstood, you should know that I highly recommend that you read and study this book, at least up to page 40. Dollar for dollar, there is no better introduction to the current beliefs and attitudes about the use and utlity of economic data. If you ever wanted to understand why bond traders rip out limbs out when the jobs report is poor, read this book.

Frankly, it's not Baumohl's fault that his book inspires no confidence in me, and left me baffled about exactly how I should incorporate economic data into my decisions. In many ways, the entire subject of economic indicators is stale and corrupt, almost beyond redemption, built on fundamentals that are shaky, and yielding doubtful nonsense. But that's life, so let's get on with it.

First off, Mr. Baumohl sparkles in prose, with a readability second to none; the man can write. He chucks overboard tons of refuse, but his remaining cargo is infested by rats. He piles through indicator after indicator, talking about importance, construction methods, where to find the important stuff online, revisions, and a release's impacts on bond, stock, and international money markets. And he makes it easy to follow

But for me, the real moral value of Mr. Baumohl's tale is to confirm for the reader that he does not need to bother with data; in fact, instead of worrying about last quarter's GDP of the U.S., Germany, or Japan, the reader should be traveling there to experience those economies first-hand. This is because 1) the data don't actually represent much that is enjoyable about an economy--you don't learn about skyscrapers by looking at construction data--and more importantly 2) other people--investors and economists--are no doubt quicker and better at using this data than you will ever be.

Don't be worried that the book informs its readers about how people use macroeconomic theory to digest macroeconomic data, regardless of the quality of either. That's how it works. Accept it.

Remarkably, Baumohl invites readers to ignore the experts--like those that advised the mal-investments leading to the .com bubble--if they'll do the dirty work themselves. He wants readers to know that out of the seeming infinity of data availale, some indicators "have established a track record for being able to predict how the economy will behave during the next 12 months" (xix). He compiles sources for U.S. and international data, although Statistics Canada might be justifiably annoyed at this judgment that "No country collects and disseminates as much high-quality economic information as the U.S. Its breadth and integrity make it the gold standard in the world." That's true but misleading, as in the judgment of other statisticians, Statistics Canada ranks higher than the U.S. in terms of data quality, and Australia might come in a close second.

The first chapter (available free online, see Mahalanobis) begins with a cute story of the process of data embargo, with journalists given the data at 8AM, and a release to the public at 8:30AM. Traders, who have already in essence, placed bets on the outcome, react to the news. So do policymakers. Sometimes, when the numbers come in far from forecasts, hilarity ensues...

What really got under my skin is that Baumohl conflates the actual history of economic activity with the collection and dissemination of that data:

All will eventually feel the fallout from the news that came from the Labor Department's press room that morning. That fallout will produce a mixture of both favorable and unfavorable developments. (6)

That release was a compression of history into a single figure. Will economic actors in the future be responding to the complexity of activity behind that figure, or to the figure itself? Perhaps it doesn't matter, but I ask what would happen if the BLS and BEA shut down tomorrow? Would economic actors no longer respond to the economy because it wasn't contained in official news reports?

Also, Baumohl notes that the reactions to this single release are not permanent. They will be modified in a fundamental way by future releases. In the story, predictions that a release of X will have some short-run effect are made without reference to tomorrow's release, which will once again change future plans. Hence, all this talk about impact on markets is intended to explain short-run movements, most of which amateurs would lose money on, if they were to try to time their investments.

The book is a solid introduction to domestic and international economic data, though it is an introduction. Most honorable is Baumohl's emphasis on the impact data release have on the interconnection of markets, and the ever rising importance of international trade.

However, the most terrifying aspect of this book is that there are no footnotes, no endnotes, no documentation, and no sources. The index is feeble. I am expected to trust Baumohl that the first of the "economic indicators most sensitive to stocks" is the payroll survey. Sorry, but I like looking at original research...

One paragraph was absolutely infuriating:

Of course, to many investors, it makes little difference whether the intial data is realible. They'll trade on these numbers anyway because the figures represent the very latest information they can get on the economy. Later, though, as more information is received and after statisticians have had a chance to review their computations, the preliminary figures undergo one or more revisions. Though revisions to earlier data are also read by investors, they generally do not spark much trading because by then the information refers to a time period that has long since passed. Investors usually focus on the future, not the past. Economists, however, take revisions more seriously because the new figures can affect their forecasts of economic activity.(21)

OK, let's figure out how bad data are useful to investors, but late data aren't. 1) All data are about the past. Do data about the past matter or not? Of course, but only the really, really recent past??? 2) If the data are not reliable at all, then they do not contain information, they contain misinformation, and should be disregarded. But they are regarded well, so they contain some useful information about the past. But revisions contain even more information about the past. Why is it that a past that is probably an additional month ("long since"???) or so older is no longer relevant. Isn't this quite arbitrary? What would happen if data were revised the next day???

Perhaps investors react because they expect others to.

Posted at 04:15 PM | Comments (4) | TrackBack

The Biggest Secret of Economic Indicators

By Kevin

The biggest secret of economic indicators is how to profit by using them. In fact, it's so secret that economists and journalists who write about economic indicators don't tell it to their readers.

This was brought to mind when I saw that Tyler Cowen just linked to a book review of The Secrets of Economic Indicators by Bernard Baumohl.

Although I have not read this book (yet), over the past 6 months I have read a half dozen books about economic indicators, all of which claim that economic data are essential for proper decision-making in the short-run and long-run, and none of which tell the reader how he can use the data profitably.

This book might be different, though I'll reprint an excerpt from the review that leads me to a pessimistic outlook:

But why should anyone other than Alan Greenspan care about economic indicators? "Because these are vital barometers that tell us what the economy is up to and, more importantly, in what direction it is likely to go in the future," Baumohl says. He characterizes them as essential knowledge for investors worried about their portfolios, company chief executives trying to make business decisions they can justify to shareholders, and workers just trying to gauge the health of their industry.

I like the idea of continual interaction between people and data: in one period, everybody's economic activities are recorded, and in the next period, their economic activities are based on knowledge of everyone's past activities as well as future plans. I just don't know how much an improvement would occur if people dropped their apparently puerile attachments to making decisions without reviewing macroeconomic data. As Richard Wagner impressed upon me, just how did smart people make smart decisions before such data were available?

In other words, what is the potential value added of these data--in billions of dollars--if everyone knew their secrets? Is there evidence that macro data have increased macroeconomic coordination, and hence GDP? How much can we gain through the persistent devout following of data releases? I have not seen one good answer to these questions! Perhaps this is all a large waste of time? Given the data, how likely are we to guess CORRECTLY the direction of the economy? Without the data, how likely are we to guess correctly? If I were working with Vernon Smith and the experimental economists at GMU, I would suggest that a large-scale economic experiment be conducted in order to measure how valuable macro data like "retail sales" and "initial unemployment claims" is to micro agents.

All macroeconomic data are vestiges the past, some of a week ago, many from last month, a few from last quarter or last year. If the data indicate small changes have occurred, you need a subtle theory and a calculator to make a conclusion about the likely direction of the economy in the future. If the data indicate large changes have occurred, I ~~think~~ hope most people in an industry will already have spotted the difference in activity and have adjusted...

My advice: you should follow economic indicators if 1) you know how to profit off using them more than you could profit doing something else with your time, or 2) you genuinely enjoy messing around with economic data--making forecasts, and pretending your forecasts are accurate, or 3) you like following the politics of economic data, or 4) just doing something to manage your portfolio makes you feel better.

Posted at 10:32 AM | Comments (0) | TrackBack

December 30, 2004

Christmas Deadweight Loss: Objectively Better Gifts?

By Ian

Not sure why I didn't bring this up earlier, and now I've decided to make it a whole post rather than just a tack-on to the original post below.

Take a look at SwapAGift.com, and click on a few of the swaps-a-lot merchandisers. You'll see a list of current cards available and at what price they can be purchased.

Could the difference between the value on the card and the price the card finally sells for serve as a measure of approximate loss on this kind of gift? For instance, on a Pottery Barn card, the approximately $8.50 average difference between actual card value and price it can be bought for might be a valuation for the difference between the monetary value laid out by the purchaser and the receiver's value -- a decent portion of the deadweight loss. Further, one might start to strarify this according to merchandiser type: compare a weighted average difference between stored value and buy prices (weighted according to average size of card to account for the kind of goods, like comparing Tiffany's versus Target, as well as the number of cards sold to take some measure of the volume of trade) to see where losses are greater or lesser relative to some standard such as cash. Those closer to cash's value might be the "better" gifts since they tend to exhibit relatively less loss (on this one measure) than others.

Of course, this kind of thing always raises more questions. Is this difference driven by trends, such as higher demands for electronics one year, home furnishings or jewelry the next? Does the distribution of stores impact the desireability of certain categories, as it might be easier to get to stores that are closer by? What about the effect of income levels? Rural vs. urban?

Or maybe I just need to lie down.

Posted at 04:30 PM | Comments (5) | TrackBack

December 22, 2004

2004Q3 GDP Rises from 3.7% to 3.9% to 4.0%

By Kevin

Clockwork.

Last month, I noted a ridiculous AP story trumpeting an economically insignificant increase in 2004Q3 GDP growth. It made "above the fold" on The New York Times online. Of course, the AP is at it again with an even smaller change:

The economy revved up its engine in the third quarter and advanced at an annual rate of 4 percent-- even faster than previously thought.

The new reading on gross domestic product, released Wednesday by the Commerce Department, exceeded the previous estimate of a 3.9 percent growth rate for the July-to-September quarter. It marked the best showing since the opening quarter of this year and was up from a 3.3 percent pace in the second quarter....

The new GDP figure, based on more complete data, was better than economists were forecasting. They were predicting economic growth would remain at the 3.9 percent pace estimated a month ago.

This is perhaps the purest example of a regularly scheduled press release becoming news, regardless of its importance... Full release here...

Posted at 10:58 AM | Comments (0) | TrackBack

December 20, 2004

NBER Working Paper: CPI Bias from Supercenters

By Kevin

Jerry Hausman and Ephraim Leibtag have a really neat NBER working paper, CPI Bias in Supercenters: Does the BLS know that Wal-Mart Exists? ($). Of course, the question is facetious and deceptive. Their inquiry is really about whether the sample of prices in the CPI are actually representative.

The abstract indicates that they aren't:

Hausman (2003) discusses four sources of bias in the present calculation of the CPI. A pure price index based approach of surveying prices as done by the BLS cannot succeed in solving the problems of bias. We discuss economic and econometric approaches to measuring the first order bias effects from outlet substitution bias. We demonstrate the use of scanner data that permits implementation of techniques that allow the problem to be solved. In contrast, the current BLS procedure does not treat correctly outlet substitution bias and acts as if Wal-Mart does not exist. Yet, Wal-Mart offers identical food items at an average price about 15%-25% lower than traditional supermarkets. The BLS "links out" Wal-Mart's lower prices. We find that a more appropriate approach to the analysis is to let the choice to shop at Wal-Mart be considered as a new good' to consumers when Wal-Mart enters a geographic market. This approach leads to a continuously updated expenditure weighted average price calculation. We find a significant difference between our approach and the BLS approach. Our estimates are that the BLS CPI-U food at home inflation is too high by about 0.32 to 0.42 percentage points, which leads to an upward bias in the estimated inflation rate of about 15% per year. (Emphasis added).

Some detail on this process from the meat of the paper, which I have yet to read in full:

Various studies have demonstrated that food items at Wal-Mart are 8%-27% lower priced that at the large supermarket chains, even after discounts for loyalty card and other special are taken into account. After entry by Wal-Mart conventional supermarkets typically decrease their prices (or do not increase them
as much as in non-Wal-Mart markets) because of the increased competition.
Remarkably, the large expansion and continuing expansion of Wal-Mart and other supercenter food outlets has almost no effect on the BLS calculation of the CPI for food.

The BLS employs a “linking procedure” that assumes “quality-adjusted” prices at Wal-Mart are exactly equal to prices at conventional supermarkets. Thus, when a Wal-Mart store replaces, say a Kroger, in the BLS sample of stores from which it collects prices, it links the lower Wal-Mart price to the higher Kroger price to remove any difference. Even though packaged food items are physically identical at the two stores, the BLS procedure does not recognize any price difference between the stores. This procedure is not based on any empirical study. Rather, it is based on mere assumption. The assumption is
completely inconsistent with actual real world market outcomes where Wal-Mart has expanded very quickly in markets that it entered. Thus, Wal-Mart and other supercenters are nowhere in the food CPI so that we find that the BLS does not know that Wal-Mart “exists” in terms of the estimation of a CPI. We also believe that observed consumer behavior cannot be explained by the BLS assumption of a compensating “quality differential.”

This is really neat, but have studies really found that WalMart prices are 8%-27% lower? Yes, they are. We reprint footnote 5:

A recent December 2003 study by UBS Investment Research found a price gap of 17.3% to 26.2%, “Price Gap
Tightens, Competition Looks Hot Hot Hot.” The previous year UBS found a price gap of 20.8% to 39.1%. For example for a specified identical market basket UBS finds Wal-Mart supercenters to have an average price 19.1% less expensive in Tampa and 22.8% less expensive in Las Vegas. In 2002, Salomon Smith Barney estimated the price gap to be between 5% and 25%. See L. Cartwright, “Empty Baskets, September 12, 2002.

Hausman and Leiptag have a separate paper analyzing consumer benefits from entry by supercenters, with identical sources...

Posted at 05:02 PM | Comments (0) | TrackBack

November 30, 2004

2004Q3 GDP Rises from 3.7% to 3.9%

By Kevin

Jeannine Aversa of the AP files an absolutely meaningless report:

The economy - helped out by more brisk consumer and business spending - grew at an annual rate of 3.9 percent in the third quarter, a performance that was stronger than previously thought.

The new reading on gross domestic product, which is based on additional data, was up from the 3.7 percent growth rate first estimated for the July-to-September quarter, the Commerce Department reported Tuesday.

GDP measures the value of all goods and services produced within the United States and is considered the broadest barometer of the economy's health.

The 3.9 percent growth rate registered in the third quarter represented a pickup from the second quarter's 3.3 percent pace and marked the best showing since the opening quarter of this year.

A revision of +0.2% is not good news, it's noise: an irrelevant and economically meaningless statistical abberation. There is, practically speaking, absolutely no difference between 3.7% and 3.9% quarterly growth at an annual rate, even if the BEA can allegedly pinpoint where it previously undercounted. In fact, there is practically speaking, no difference between 3.9% and 3.3%. GDP measurement is not that accurate!

Journalists seem to have no background in errors of economic data; so here are two rules of thumb: 1) the average change in GDP from the first to third release is about +/-0.5%, and 2) revisions after that (which are uncorrelated with the first two revisions) revise estimates an average of +/-1.0%. Anytime you see changes smaller than that, they're more likely than not to be eliminated by later revisions.

Note to The New York Times web editor, just because it sounds important, doesn't mean it should be on the front page:

Posted at 09:29 AM | Comments (0) | TrackBack

November 15, 2004

Google and Known Unknowns

By Paul

Sometime time back Brad DeLong had an interesting post about searching and creating metadata which is worth quoting in detail:

…Let's take Donald Rumsfeld's four catagories: the known knowns, the known unknowns, the unknown knowns, and the unknown unknowns:

The known knowns: The things that you know, and that you know that you know. Here there is no information retrieval problem at all.

The known unknowns: These are the things that you know are on your hard disk someplace, but you're not sure where they are or what, exactly, they say. Your recollection needs to be refreshed. Here is where search based on full-text indexes plus high-quality metadata shines. We know how to make full-text indexes. We know how to search such indexes plus metadata. The only potential problem is a social engineering one: how to make sure that high-quality metadata about files is created and maintained.

The unknown knowns: Once you have found your known unknown, you then want to find what other files on your hard disk are related to it. The same keyword and text search won't necessarily pick them up. This is what subdirectories--folders--are supposed to be for: one of the benefits of grouping related files in subdirectories is that one can then thrash about and get hold of related information. And, because one file may well belong to more than one possible group of unknown knowns, we have symbolic links--aliases. Once again, however, there is a social engineering problem: how to make sure that files are sorted into the right folders and that the right symbolic links are created, for this task can also be "tedious in the extreme." And we are vain and lazy infovores.

The unknown unknowns: These are things that one would search for if one remembered enough about what was on one's hard disk (or knew enough about what was on the web) to know that one should look for them. Here we have a very difficult problem: how do you jog someone's memory or tell them enough about what is known so that they can figure out what kinds of things they can search for? I think that this is a very hard problem indeed.”

I think product’s like Google’s Desktop Search fit into the ‘known unknown’ category above. David Pollard further expands on Google’s foray into the ‘Personal Content Management’ tools. More on the competition between Google and Microsoft is described in this Economist article.

My favorite desktop searching tool is Copernic Desktop Search. Together with Google it is a good combination. A list of similar products is given by DeLong. I wonder why our IT departments didn’t recommend these, they ought to be concentrating on improving personal productivity.

Posted at 09:33 AM | Comments (3) | TrackBack

November 05, 2004

Number Employed vs. Unemployment Rate

By Kevin

(Warning: This is another vague, interminable rant on the quality of data).

Occassionally, the number employed rises even while the unemployment rate rises--or both decline. This is normal and explicable. One could argue that the number of people looking for jobs increased more than the number of jobs. But one could also argue that the real reason for this inconsistency is purely statistical, not economic. The series aren't really linked, so the economic explanation is an illusion.

The number employed comes from a survey of businesses, and the unemployment rate (number employed/ labor force) comes from a survey of households. The household survey comes up with its own estimate of number employed, and uses a different definition, to boot; the only real link between the two surveys is the actual economy. The measurement processes of each survey program are incredibly different. It almost seems remarkable that the data are as close as we find them to be... The business survey indicated growth of 337,000 jobs in October and 139,000 in September. The household survey indicated growth of 298,000 jobs in October, after a decline of 201,000 in September.

Producing these data series is an enormous task for the Census and BLS: covering a dynamic population, utilizing bureaucratic organizations, serving heterogenous goals, blocking (or aborbing) political pressures, conducting independent survey samples, and employing differing complex methodologies combine to yield estimates that are some of the best numbers we have about economic exchange.

The tasks that newspaper reporters set for the data, require data that can expose and explain minute changes in overall employment, and employment among races. But the errors in each of these surveys are frequently larger than the granularity needed. How large? One indication is given in the article linked above:

The jobless rate for African-Americans jumped to 10.7 percent last month, up from 10.3 percent in September. The rate for Hispanics fell to 6.7 percent from 7.1 percent from the previous month, while the rate for teenagers grew to 17.2 percent from 16.6 percent. The rate for whites held at 4.7 percent.

Now, do you really think that the unemployment rates are that variable month to month? Or is this survey error? How would you extract the signal from the noise?

Posted at 11:44 AM | Comments (0) | TrackBack

October 21, 2004

Anecdotes vs. Data: Real Estate Edition

By Kevin

It seems that many real estate agents, like many baseball club recruiters and managers, rely on gut instincts and personal experience instead of the data:

In higher price ranges, though, agents report an uptick in the number of days properties are staying on the market. "I'm looking at the listings for Bethesda now," Jane Fairweather, an agent with Coldwell Banker Residential Brokerage in Bethesda, said last week. "Here's what I see priced between $800,000 and $900,000: 279 days on market, 83 days, 16 days, 118 days, 56 days, 17 days, 2 days, 55 days, 30, 45, 27, 67, 120."

Always remember that we have to pick and choose our anecdotes carefully; they're non-random, judgmental surveys, easily biased by not being representative of the whole. Anecdotes can provide valuable insight into areas for which no reliable data are available. But wait, that set of anecdotes above--covering a narrowly priced range of homes in a single area--didn't compare anything over time! And hard, reliable data are available on the average length of time homes have been on the market.

So what happens when the reporter looks it up?

Although many agents report anecdotally that homes are remaining on the market longer, statistics from the area's multiple listing service show days on market for all properties relatively flat for most local jurisdictions in September.

Now, I'm not one to ignore people whose livelihoods depend on being right about a specific market. And I understand the urge for the juicy soundbite for a reporter to nibble. But is it too much for the reporter to point out that real estate markets fluctuate all the time? That is, shouldn't we expect that--even in a sizzling market--a real estate agent would likely be able to find one neighborhood whose expensive houses are staying on the market longer than before?

Posted at 12:29 AM | Comments (2) | TrackBack

August 11, 2004

Revisiting Edmund Andrews

By Kevin

Last October, I noted with sadness that I could no longer trust The New York Times business section for Mr. Andrews' failure to substantiate claims regarding the "majority" of economists' and forecasters' views on the economy. I argued that factual assertions made in news articles require source statements; only when social facts are established within reason can we move forward in public debate.

Now, Russell Roberts is all over Mr. Andrews for a whole shopping cart full of intelectual crimes: drawing conclusions from inconclusive data , stating opinions as facts, using political sources without disinterested affirmation, and simply getting the numbers wrong. He concludes:

I suspect the New York Times reporter misread it to mean that the number of jobs in the high-paying industries is unchanged, ergo, zero job growth in the high-paying industries.

I have a call into Mr. Andrews. I'll re-post if anything changes.

But here's what's amazing and a little bit frightening. This claim that no new jobs are being created in the highest-paying industries will become what Joel Best calls a "mutant statistic." Whether it's true or not, because it was in the Times, it will get quoted and cited as fact. I don't think it is. If I'm wrong, I'll let you know.

Posted at 03:02 PM | Comments (0) | TrackBack