I've always liked John Tukey's work...
At the other extreme, they must, at almost the same time, be honest in assessing the uncertainties of their final results. In the latter they cannot be satisfied with allowance for only the likely size of "sampling errors", a task with which routine manipulations can often help them; they must, most particularly and responsibly, make explicit allowance for the likely size of "nonsampling errors", for the extent to which the data given to them was neither what it purported to be nor what it ought to have been. No other profession must support itself over so wide a span from security to insecurity. (Tukey, The Future of Data Analysis, 1965; 23-24)
Instead of studying and graphing what we know, Tukey suggested graphing what we know cannot be, and improving graphs, “by shifting emphasis from "what might be," inevitably truly fuzzy, to "what we know cannot be," which only has fuzzy edges” (Tukey, 1986, 73). He presents an evolution from the most certain estimate of what we know to the most certain estimate of what we don’t know: from best estimate, to the likely interval, to range of the impossible:
The difficulty we face (in economics) is that total error bounds are not known; nonsampling errors have not been investigated enough for me to come to any general empirical conclusions. For macroeconomic data charts like those above are mere guesswork.
We simply do not know enough about nonsampling error, nor are we likely to during my entire professional career. What stance am I to take towards data? How am I to use it seriously, without prejudice or bias? That’s it, I’m not really concerned with the bias in the data as much as I am with my own personal bias in understanding and using it. I want to know, if the unemployment rate is 5.4%, what kinds of stories using that number are real, and which are imaginary?
Here's another appendage mercifully sliced from the dissertation proposal. I have no idea why I wrote an elementary discussion of accuracy vs. precision in the first place; upon further review, I'm rather embarrassed that I didn't obliterate it months ago...
In physical science, the terms accuracy and precision have specific meanings generally accepted by practitioners. Accuracy measures how close any single estimate (or average of a group of estimates) is to the true value of interest. Precision measures how close several estimates are to each other.
For an example in economics, an accurate measure of employment growth would be within x% of actual employment growth, with x selected arbitrarily An "equally" precise measure of employment growth would have two standard measures of employment growth generally within x% of each other. Another example of precision of x% would be if a single series were regularly revised upward or downward by that amount.
Revisions of statistics can be treated as a means of gauging the precision of a macroeconomic data series, but not its accuracy.
Both accuracy and precision are objective properties of the measuring tools and processes used; however, measuring accuracy requires a generally accepted benchmark, while measuring precision requires only the data at hand.
Most variables in physical science are estimated using commercially developed tools—telescopes, microscopes, etc—that have been tested under many conditions to yield a known level of accuracy—a level that can then be used as a portion of the total measurement error. In physical science, “measurement error” is addressed through repeated measurement of the same variable under tightly controlled conditions. That, is all biases are assumed to be irrelevant, to cancel out. Repeated measurement is meant to gauge the precision of the particular measurement process with particular tools; accuracy is assumed to not be the concern of statistical analysis.
In economics, “measurement error” is not similarly addressed. The statistical distributions of measurement must cover both accuracy and precision. Most variables in social science are estimated using surveys of individuals, households, and businesses. Each of these units is sampled once, yielding an unknown level of accuracy at the unit level and consequently at higher aggregation levels. Sometimes, as with employment figures, more than one independent estimate is made of the same concept; after reconciling the differences in construction, it is possible to compare the estimates, and develop a measure of the joint precision of the two measurement processes.
Here's another "think piece" cut from my dissertation, in which I decided to reread and comment on portions of Mises...
The quality of the commodities produced and consumed changes continuously. It is a mistake to identify wheat with wheat, not to speak of shoes, hats, and other manufacures. The great price differences in the sycnchronus sales of commoditieis which mundane speech and statistics arrance in the same class clearly evidence this truism. An idiomatic expression asserts that two peas are alikel but buyers and sellers distinguish various qualities and grades of peas. A comparison of prices paid at different places or at different dates for commodities which technology or statistics calls by the same name, is useless if it is not certain that their qualities—but for the place difference—are perfectly the same (Mises, 1996, pp. 220-1).
Is this really true? Are price indices “useless” if they are taken over commodities of non-homogenous quality? Shouldn’t it be possible to quantify--or describe--the errors involved in the index-number creation process, to separate the signal from the noise? That is Yes, one must explicitly hold quality constant. Only if quality is not measurable or estimable within useful margins is it reasonable to reject the index-number procedure.
[T]here exist different methods for the computation of averages. There are the arithmetic, the geometric, the harmonic averages, there is the quasi-average known as the median. Each of them leads to different results. None of them can be recognized as the unique way to attain a logically unassailable answer. The decision in favor of one of these methods of computation is arbitrary (Mises, 1996, pp. 221-2).
Why is Mises talking about this? He's trying to show that Irving Fisher’s desired adjustment of the quantity of money to meet purchasing power is not a logically proven method. Sure, any method chosen is arbitary, but VERY frequently, the results (of the most popular estimators) yield estimates in the same direction. Some decisions can be made reliably with this data. Others cannot. What is absent in Mises’ formulation is the recognition of the uses of data, to which each average must be benchmarked for appropriateness. The diversity of calculation methods is actually a diversity of estimators of the “middle”, and there exist tradeoffs between these estimators. While there exists no “logically unassailable estimator” for a given population parameter, there do exist minimum variance unbiased estimators and the like.
Mises has a far more potent argument when he insists that anybody’s carefully chosen measurements of their own purchases are just as scientific as index numbers:
A judicious housewife knows much more about price changes as far as they affect her own household than the statistical averages can tell. She has little use for computations disregarding changes both in quality and in the amount of goods which she is able or permitted to but at the prices entering into the computation. If she “measures the changes for her personal appreciation by taking the prices of only two or three commodities as a yardstick she is no less “scientific” and no more arbitrary than the sophisticated mathematicians in choosing their methods for the manipulation of the data of the market. (Mises, 1996, pp. 222-3).
Now this is a cute analogy. But what else can one say about this? For Mises, it’s either science, or it ain’t. There seems to be of little point to seeing how much better the average housewife can do than the statistician, even if both are not “scientists”. But one sees why Mises holds this opinion; for Mises, there is stable statistical truth about populations, but market processes destroy (what I'll call) continuity. Index numbers are estimates of discontinuity, and work when estimating large areas of discontinuity, but fail utterly in an attempt to gauge small changes. I think most economists will recognize that when one is trying to measure the average movement of millions of prices, there’s simply no reason that movement must follow a known general pattern.
Hence, choose your index number to favor your ideology; in the end political arguments shift to irreconcilable arguments over statistical methodology:
[N]obody acquiesces in an index number if he does not expect a personal advantage from its acknowledgement by public opinion. The establishment of index numbers does not settle disputes; it merely shifts them into a field in which the clash of antagonistic opinions an interests in irreconcilable. (Mises, 1996, p. 223).
The first sentence above suggests the potential existence of 1) undue influence in the calculation of index numbers, 2) a theory of collective choice over statistics.
The second sentence suggests that a plethora of data exist because production of social statistics is a game—a game in which all sides have every incentive to hide their shoddy calculations. Non-technical individuals have no means to challenge the data. All sides realize this, and now have a “gentleman’s agreement” not to discuss the dirty details—biasedness, large variance, imprecision of population, the unknown applicability to real world problems—of index calculation. Everybody gets their turn to use bad data, and the costs of declaring that your opponent’s data are bad include are having them declare your data are no better.
Yet, there is a system of measurement even Mises could consent to:
A datum of experience and a statistical fact is only a price paid at a definite time and a definite place for a definite quantity of a certain commodity. The arrangement of various price data in groups and the computation of averages are guided by theoretical deliberations which are logically and temporally antecedent. (Mises, 1996, p. 351).One jumps to reply, “So let them be guided by such deliberations!”
No aprioristic theory exists to determine the exact outcomes of changes in law that apply to the economic system. In fact, Mises is fine with data, as long as you are not trying to construct economic theory out of it. If one is trying to estimate the impact of rule changes, and such changes have occurred elsewhere, Mises would applaud looking at the historical record as a guide for lawmakers and businessmen. But this is not economic theory, and there is no way to determine where—inside or outside the historical range—actual outcomes will lie.
(Regarding) Economics=prediction=forecasting.
As against mathematical economics the request for a dynamic theory is well substantiated. But there is no means for mathematical economics to comply with this request. The problems of process analysis, i.e., the only economic problems that matter, defy any mathematical approach. The introduction of time parameters into the equations is no solution… (Mises, 1996, p 356).
What about the persistent elaboration of the distribution of prices, wages, etc., from one month to another, estimating frequency distributions, and using changes in the parameters of those distributions to view economic change, historically? This is clearly acceptable to Mises, but it does not generate a permanent theory of economics.
------
And what of theory? Daniel Bell in Models and Reality in Economic Discourse is all over the map:
Economic theory, unlike physics, is not constitutive of a single underlying reality. Nor can it be, pace Alfred Marshall (and Gary Becker), timeless generalizations about human behavior. In consequence, economics cannot be, as its model in classical mechanics, a “closed system” which ignores change or the effort to discern specific patterns of change. (Bell, 1981, p.80).
Quite frankly, arguments like this get tiring after a while. If I want to construct a static, timeless, and a priori theory, that’s my business. If I want to construct a model that ignores process, that’s my business. And I emphatically deny the following:
[E]conomic theory should not be taken as a “model” (or template) of how human beings behave, for these will always be inadequate, but as a “Utopia,” a set of ideal standards against which one can debate and judge different policy actions and their consequences. That, it seems to me, is the meaningful role of any social “science” in theorizing about human affairs. (Bell, 1981, p. 80).
The idea that economics is a science of benchmarking the frail reality against the impossible ideal has a substantial draw for many. But what if one disagrees with the benchmark? One is left in Mises’ quandary about the best index number to use as a benchmark, with no scientific way to resolve the issue.
[Note 1/13/05 | 15:36: Formatting and verbal changes have been made.]
I don't remember writing this paragraph, or even thinking about it:
Let's examine the price of wheat in Smith’s Wealth of Nations--an extensive historical series compiled from one major source and many minor sources. We ask a simple question, "what are these prices supposed to represent?" The median exchange price, the mean exchange price, the first or last price of the year, the average price on a certain day? What is the error in this data? What does this imply about markets, people, processes, end states (equilbria), welfare, statistical procedure?
That's an entire dissertation in itself.
As most of you know, I'm writing a doctoral dissertation on error in economic data. As motivation I wrote small "stories" exploring ideas and concepts that were useful to me. Unfortunately, most of these stories had to be deleted to leave room for the scholarly material. However, they will be reproduced here in a series entitled "Deletions from a Dissertation". Here's the first:
With persistence, it is possible to uncover data about almost every aspect of the natural and manmade worlds. Do you want to know the number of cars on the street in your town at 3AM, or the number of books in the local library, the number of trees in a local park, or the average number of man-hole covers per Manhattan city block? Some person probably knows or can calculate usefully accurate estimates of these population parameters, because he has made it his business. (Finding this person is another matter). People crunch and store data when the numbers are to be used to monitor or solve a problem they are interested in. Uncovering data for data’s sake is a worthless enterprise.
Equally worthless is generating publicly available data for “problems” that are already “solved” or “monitored” by spontaneous orders, institutional frameworks, business enterprises, and professional cadres. Do you concern yourself with the speed and throughput of national check processing systems, or the efficiency of late-night maintenance operations of a shopping mall, or with ensuring that enough Pop-tarts are on the shelves during a hurricane? Of course not—unless it’s your occupation to manage or work in these positions; the division of labor sets aside a separate private sphere so that check writers, mall shoppers, and Wal-Mart vendors can address their own specific problems without public scrutiny.
Why do some people, passionately driving for “social justice” through legislative or administrative action, acquire data to tackle one perceived injustice, but tacitly trust a whole network of systems that could just as likely endanger them? As a specific instance, why do the same people who daily trust their lives to an engineer who computes bridge stress tolerances, mock an economist who computes an output forecast wildly deviating from consensus? The short answer presented here is that long experience of repeated trial-and-error has created extraordinary expectations of the performance of the engineer, and almost complete unfamiliarity with the actual performance of the economist.
Public data about a system are acquired when people have no basis for trusting that system, regardless of the actual outcomes the system produces.
Engineers have demonstrated the soundness of their theories and data through repeated testing by those who have driven over road bridges. The honesty of engineers is persistently tested against reality; of course bridges do fail, but only under extraordinary circumstances, which is precisely how bridges are intended to function. In bridge engineering, issues of competency, reliability, quality, and fraud are left to narrow interest groups, federal and local government regulators, and sometimes advocacy groups. The average trucker does not have an opinion of bridge quality, except perhaps for preferring a smooth road surface to a river of pot-holes.
In contrast, discussion of political economy is almost universal; talk about the economy is a social and political phenomenon independent of professional economists’ superior understanding of theory and performance. Historically, economic issues have been discussed at some level by almost everyone—politicians, press, punditry, and populace—everywhere, during every epoch. Unlike engineers, economists are not routinely tested by the real world; hence, economists utilized the only clear means of convincing others (or one another) that their theories are sound or data are truly trustworthy. Today, “the economy” is considered a problem without a one-time solution; in good times and bad times, the necessity of somebody tinkering with the economy is the only certainty.
For example, despite the comparative wealth of free-trading nations, there is no simple, everyday, repeatable test to demonstrate the correctness of the results of the theory of comparative advantage. The political discussion of free-trade does not die down. The role of political institutions in the economy is always open for discussion and controversy. Economic issues are not left to professional economists, regulators, and advocacy groups to sort out.
Everyone “has an opinion” on economic matters. With the advent of government statistical bureaus in an age of mass media, data releases have become public events, spectacles—part of the core of routine democratic public life. People want public data about “the economy” because they believe, rightly or wrongly, that the economy is not running properly. The level of public concern is as if, every few years, the economic “bridge” were to fall down as everyone were driving over it.
Most members of the general public have a limited ability to understand the requisite physical theory and mathematics of bridge construction and maintenance. Because it would be ridiculous for automobile drivers to stop and judge the quality of every bridge crossed, a system of law and informal institutions evolved such that drivers can and do trust—without doubt or second thoughts—the stability and safety of bridges, and trust that the engineers who design and build them are accurately portraying their structures’ strengths and weaknesses. The institutions encompassing the engineering disciplines and professions are so highly developed—and so trusted—that very few people think about safety at all until an infrequent failure occurs.
What could be termed “the bridge system” has been simplified to the point where only a few signs and indicators are needed for it to run smoothly. Bridges are designed so that the average trucker does not have to think hard, or need to know in detail how bridges are built; when encountering a bridge, he needs to know only two pieces of information: the height and weight of his truck. Passing under a bridge is simple. As long as the minimum vertical clearance of a bridge (usually posted accurately to within an inch) is greater than truck height, a trucker knows he can pass under. Riding over a bridge is not much different—a trucker must not pass over if the bridge tolerance is less than his truck’s weight.
However, the weight tolerance of a bridge is not simple to measure; the actual weight tolerance of a given bridge is usually much higher than posted limits. To ensure that no trucker pushes the limits, engineers deceive the public into thinking that the bridge is far weaker than it actually is.
But of course most of the time these bridge limits do not have to be examined at all. Truck routes are pre-planned, widely known, standard paths with all road surfaces meeting clearance and weight tolerance levels. Truckers do not stop on the road unless they have to.
The general public’s same limited ability to understand difficult theories applies to understanding of political economy. Only in this case, the public insists on a continuous “pulse-taking” of the system. Policymakers regulate, control taxation and government expenditure, and frame the rules of the economic game. In order to understand the actual effects of these complicated government policies (or lack thereof), those who want to understand the economy (let’s call them “voters”), need longitudinal and time series data about macroeconomic performance. But these data, as actually produced, have only limited scope, application, and adequacy—even with cutting-edge safeguards to ensure data integrity and accuracy. Hence, in their drive to understand the economy, voters must “stop at every bridge” to check economic data quality; it is as if they do not trust the posted weight tolerances or clearances. But when they try to check for quality, they find that nobody can tell them how accurate are the measurements of the clearance and weight tolerance of the bridges they have to use.
The economic system is so complex that anybody trying to navigate it must learn in detail the construction of economic data and the workings of the political process. To judge “the performance” of “the economy” requires more than estimates of aggregate output, employment, interest rates, and inflation—the Keynesian economists’ bridge clearance and tolerance. Economic data do not come with a simple context for comparison. A trucker can compare truck height to clearance, and truck weight to stress tolerance. To what can a voter compare historical data and forecasts of CPI, GDP, the overnight lending rate, and non-farm payroll employment? Even if accurate, what can he do with those numbers?
Based on revealed preference, the general public believes it needs economic data, even though these data are not what they appear. Even if they were accurate, they cannot be used in a simple manner to understand or judge the economy. What voters really want is an economic system they can trust without thinking about it—just like the bridge system. This being practically (and politically) impossible, voters utilize their second-best solution: a straight-forward means of using data to judge the performance of the economic system. The public task of economics is not to demonstrate in detail the accuracy and relevance of the data; the public does not want economists to show off. Instead, the public task of economists is to make those expert judgments, and present them to the public through macro and micro media. Much of macroeconomics is grounded in a debate about the relative outcomes of alternative policies proposed or actually enacted. The data being more than imperfect, economists cannot simply state numbers to make convincing arguments about which policies are “best”. To tell the public what the data actually imply about the economy, economists must incorporate the data’s likely error into their analyses. They must already have determined whether macroeconomic data are accurate enough to make decisions about which policies should be pursued. If the data about a system cannot be had accurately enough to choose between options, is it not irrational for people to concern themselves with learning and modifying the system? How is a choice between options using inadequate data different from that same choice using no data at all? Aren’t both the equivalent to throwing darts?