U.S. tax returns form the basis of the world’s largest dataset – Quartz

In the past three months, approximately 150 million American households have filed their taxes. In doing so, they not only funded the US government and filled the coffers of H&R Block and Turbotax. They also helped create one of the world’s largest data sets – numbers that changed what we know about the state of the American Dream.

When Americans file their taxes, the Internal Revenue Service (IRS) keeps all information associated with each filing. Besides income, this includes information such as the age and social security number of each member of a household. This information is not only useful for identifying tax evasion and assessing the effects of changes to the tax code. It is also the best data set to assess income inequality and the chances that a child born poor can become rich. This is even better than the data collected by the US Census Bureau, as it is collected annually and based on real income data, unlike the census, which is based on survey questions.

The best-known early analysis of data from tax returns is a 1955 study by Nobel Prize-winning economist Simon Kuznets. Kuznets used the data to examine the level of inequality in the United States over time. He found that income inequality increased in the first part of the 20th century, when the country was poor, but then improved in the 1940s. With this information, he developed the hypothesis that economic development increases economic growth. inequalities in poor countries but reduces them in rich countries. (This influential assumption, commonly referred to as the “Kuznets curve,” is now disputed because inequality has recently increased in many rich countries.)

Kuznets’ access to IRS data was unusual. The IRS doesn’t just give access to any researcher who requests it, because people wouldn’t want anyone to have access to their personal financial details (just ask Donald Trump). Before computers, tax data was a chore to analyze. For these reasons, for most of the twentieth century, access to household-level data from income tax returns was difficult to obtain, and analyzes were scarce.

Over the past 15 years, the use of this data has increased. Armed with more computing power, economists became interested in large and complex data sets, and the IRS has also become more Willing to share. Economist Timothy Taylor says this is part of the “data revolution” in economics, which has underpinned a number of the most important recent studies in the field.

Perhaps the most important recent study (pdf) using tax data was conducted by economists Thomas Piketty and Emmanuel Saez, published in 2003. Piketty and Saez used the data to show that the income share going to the highest earners in the United States has considerably increased in the latter part of the 20th. century – from about 8% in 1980 to over 14% in 1998. Their discovery would reshape the debate on inequalities and shed light on Piketty’s best-selling book The capital in the 21st century. (There is now a lively discussion (pdf) to find out if Piketty and Saez overestimated the increase in inequality.)

More recently, Stanford economist Raj Chetty used tax data to examine economic mobility between generations. Chetty and his co-authors found that because of rising inequality, Americans are less and less likely to earn more money than their parents. They also found that black men are much more likely to go down the income scale than white men, and much less likely to stand up.

Another study, published earlier this year, used data from the IRS to examine the relationship between parental income and the likelihood of a child ending up in prison as an adult. The study found that boys born to households belonging to the poorest 10% of income were 20 times more likely to be incarcerated on any given day in their early 30s than children born in the richest 10%.

These recent studies have been revealing, but also cause for concern. Although access is greater today, the IRS still only accepts a small number of study requests each year. The agency does not have a large budget to support research.

This has “unequalConsequences, according to economist and blogger Tyler Cower. Only a small number of researchers have the time and support to access IRS data. In the journal Science, Jeffrey Mervis described in detail the obstacles that researchers must overcome to obtain the most detailed information. As a result, most of them come from the best-resourced colleges. Brown University economist John Friedman has set up a list of researchers (pdf) who got the data, and almost all of them are from elite schools.

Given the incredible power of tax return data in explaining the U.S. economy, it’s important that more researchers can analyze it. The US government should find a way to improve access and in doing so generate some goodwill for an agency that Americans love to hate, especially at this time of year.

Previous You can deny the environmental calamity - until you check the facts | George monbiot
Next Are environmentalists hypocrites? - CSMonitor.com