Saturday, February 21, 2015

How can social scientists be scientific?

The empiricist in economics and finance is concerned with describing behavior with data. The challenge for the social sciences more generally is that we use observational data rather than data gathered from controlled experiments. Observational data captures choice outcomes that we can observe but that we do not influence as the researcher. For example, we may observe that investors bid up a stock price one day. Did the price move up because of unexpected news specific to the company (new CEO), industry (better industry performance) or overall market (higher macro growth expectations)? In order to explain behavior in finance and economics, we need to have a sense of whether X causes Y.

Observational data contrasts with scientific data. In scientific research labs, scientists alter behavior directly by controlling a key variable - such as expression of a certain gene. The scientist hypothesize the different channels through which X affects Y and gradually eliminate channels with experiments. The social sciences researcher cannot alter someone's aversion to risk, change someone's IQ, move people around states or afford to redistribute wealth. Observational data is full of choices made by individuals. The social scientist is also trying to also figure out whether X causes Y, but the social scientist needs to use tricks in order to make such causal statements.

Let's work through a fundamental example. An important question in the economic literature for decades has been "What are the returns to schooling?" At first glance, the answer seems obvious - schooling should cause higher salaries or other outcomes. But, there are other equally plausible stories that get to the heart of whether schooling is the driver of higher salaries or something else that is correlated with schooling. For example, years of schooling may be correlated with biological IQ (if that exists). Higher IQ students may choose to get more schooling because the marginal effort of finishing homework and taking tests may be relatively lower. Higher IQ students may also get more out of the course material each year in school. In this other story, schooling is not causing higher salaries but rather revealing or signaling differences in the IQ of students.

Knowing whether schooling actually causes higher salaries is important for policy choices. On the one hand, if IQ is really important, then subsidizing schooling for the masses may break down the signaling effect. Students of lower IQ may spend more time in school, accumulate more debt and get relatively lower salaries nonetheless. On the other hand, if schooling is driving the returns, then subsidizing schooling may be the correct approach.

Social scientists look for ways to approximate the scientific experiments in life sciences research. Related to the question of returns to schooling, some authors have used the month a child is born - which is not usually a choice for mothers - to determine how age/maturity relates to schooling. Students must be 5 years of age on or before September 1 to start kindergarten that year. The youngest child in kindergarten is at a disadvantage relative to the older more mature child - and again this start year is not a choice but regulated by the government. This regulation is independent of the child's skill, family background, etc. One can use this regulation as a treatment to test whether a persistent disadvantage - being the youngest in the class - results in worse performance outcomes in later years.

One can think of skill, family background and age as three different pathways leading to choices of schooling and future salaries. Like the life sciences scientist, we want to study one pathway at a time by "knocking out" the other pathways. The regulation isolates the age pathway. By using tricks like this regulation, we social scientists can approximate the experimental design of life sciences.

If a social scientist cannot identify a specific "treatment" variable outside of the control of the actors, then another possibility is to build a model. A model describes the behavior one might expect given some initial plausible assumptions. The model formalizes the reasoning. The social scientist then brings the model to the data by testing various hypotheses the model generates. For example, if a model has 4 strong predictions, and the data matches all of the predictions, then there is evidence consistent with the model.

A stronger test would compare the hypotheses from a competing model ("another pathway") with those of the new proposed model. Any differences in the proposed hypotheses provides a test capable of differentiating models. By weeding out pathways, a social scientist can better understand the model or reasoning underlying actor's decisions.

The problem is that evidence against any model may mean two things. First, the agents may be behaving irrationally if the model is correct. Second, the model may be incorrect and fail to describe the behavior of the actors. This ambiguity with regards to evidence against the model is a joint hypothesis problem. By testing a model, one is testing both whether the model is correct and whether actors are behaving rationally. Distinguishing between the joint hypotheses is very difficult without controlled experiments.

No comments:

Post a Comment