Imagine you are tasked with investigating the effect that household income changes have on a certain variable, such as the risk of war. But unfortunately, war can affect growth, so how can you disentangle the twoway causality? Check the weather.
In fact, the above approach is precisely the approach used in one of the most influential papers on the relationship between economic growth and civil violence. As Collier notes in The Bottom Billion, because many developing countries depend on agriculture, getting too little or too much rain can severely affect growth. But fortunately for us economists, "prospective rebels do not say, 'it's raining, let's call off the rebellion'". As such, rain functions as an instrumental variable that allows us to proxy for the effect of growth on war, but avoids the effect that war has on growth. Besides in the study of civil conflict, rainfall shocks have long been used to investigate a diverse range of issues, which can range from the role of remittances as insurance, human capital accumulation, and sex-selection. While rainfall shocks seem like quite an obvious tool after the fact, I cannot help but smile at the thought of using them as such a, pardon the pun, instrumental part of research on development.
It also makes me smile because it excites me about what other data sources economists will have to leverage in the future. For example, an important part of Mian and Sufi's work on the effects of subprime mortgages was Saiz' house price elasticity data. Saiz calculated house price elasticities in metropolitan areas based on very specific geographic properties such as the percentage of area covered by water or the presence of steep terrain. He was able to generate such a thorough dataset by using satellite and topological data. Such computations, while impossible a few decades ago, are now much simpler. From the comfort of my apartment, I can easily pull up a street level map of New Delhi* and customize it using open source R. And if even I can manipulate such powerful tools from the comfort of my laptop, just imagine the new opportunities that could open up as the result of concerted research.
Other writers have commented on this "new generation" of economic data, but I think the studies discussed above add a little color on what more data really provides us.
It's tempting to say that more data will give us more correlations to work with and better predictive power. This is not necessarily the case as the number of spurious and uninformative correlations necessarily increase as the amount of data analyzed rises. However, something Big Data does give us is a better way to organize all the "natural" data sitting out there in the world. When Watson was introduced, attention shifted to the possibilities that a "personal Watson" could have on tasks that involved large database searches, such as medical care or legal research. There is no reason for economists to not share in these benefits. Many clever studies pivot on a very clever design, whether rainfall shocks or regression discontinuities because of geography. Thus Big Data may become less of a tool for direct prediction, and instead become an indispensable tool for economists to identify and deploy increasingly clever instruments and natural experiment designs.
This kind of "data mining" would not be so much as for finding correlations but to enrich the datasets that we have available. As I found out this year working on a housing finance project working with the AHS, privacy is a big deal in surveys. But with the possiblity of estimating non-economic public variables such as weather or geography, we have ever more powerful tools for estimating parameters for large groups while preserving the privacy of individual people. And even if merging individual entries is always difficult when comparing multiple datasets, such common public variables would allow us to create a base set of variables to enrich any dataset and analysis.
This change in data capabilities also has implications for the intellectual tools needed by economists to understand the data. While rigorous econometrics, especially spatial econometrics, will stay very important, it may become more important than ever to have a solid foundation in economic history. In the wake of the financial crisis, it has been fashionable to talk about how economic history would have given us a better idea of how to respond to the crash. Yet even beyond these policy implications, a better understanding of economic history could motivate the mining of old data sources, such as newspapers.
A Google scholar search for "rainfall shocks" or "rainfall shock" yields about 1400 results. What will be the future analogous tool for other fields of economics?
*On google maps, go to New Delhi and start scrolling to the west. While you are amazed by the ability to discern individual streets and apartment buildings, observe the dramatic change to a checkerboard of individual farm plots. In fact, the first time I saw this I thought the graphics resolution on my computer messed up the rendering.
Curb Your Enthusiasm on that rain stuff. See Angus Deaton's critique of it - external vs extrinsic.
ReplyDeleteOut of curiosity Yi-chuan, have you seen Perry Mehrling's interview of the statistician Cosma Shalizi? Cosma Shalizi was interviewed by Mehrling because he had received a grant from the INET aimed at introducing more up-to-date statistical techniques for economists to use in their research.
ReplyDeleteIf you have not, then here it is.
http://ineteconomics.org/video/30-ways-be-economist/cosma-shalizi-why-economics-needs-data-mining
P.S. Do you look forward to learning about Non-Parametric Statistics still? :-P