The Challenge of Measuring Water Conservation Programs
Back in 2014, California Governor Jerry Brown issued an executive order mandating that state water utilities reduce water usage by 25 percent in response to drought. The natural next step for utilities was to ask themselves how to go about achieving the cut. Practically, there are only two types of levers to pull to induce conservation in the short run: increase water prices or implement some sort of non-price intervention (e.g., outdoor watering restrictions, information campaigns, or drought shaming).
Measuring the success of such efforts is important for future planning. So, how reliable are evaluations of water conservation programs? In a new RFF study, Paul Ferraro of Johns Hopkins University and I take up that question and evaluate two different methods to encourage water conservation. Specifically, we examine the conservation effect of social comparison campaigns (e.g., mailing a letter to utility customers to the effect of, “You used 20 percent more water than your neighbors last month”) and information campaigns (e.g., a letter that says, “Here are some tips to reduce your water usage”). We find that the reliability of such evaluations depends on how well a “control” group mimics the behavior of the “treated” group—an assertion that we often can’t observe.
Our study highlights some of the problems that can arise when using messy data sets. The intuition is simple: In an ideal world, we would like to compare apples to apples. But sometimes we only have one apple and maybe a few oranges. The goal is to pick the orange with the most apple-like qualities, and then see what differs between it and the apple. Sometimes, comparing apples and apple-like oranges can be insightful, but often we don’t know by how much. So, we often use simple observational tests (e.g., this orange is round, just like our apple!) to measure things that we fundamentally cannot observe (e.g., our apple did not grow on an orange tree).
What does this mean for measuring water conservation? If we compare the water consumption of a household exposed to a conservation program (an apple) to that of a household in a neighboring county without a conservation program (an orange), we might be able to produce a good estimate of how much water was conserved through the program. This is assuming that apple-like oranges can serve as a good counterfactual for apples (i.e., we can estimate what water consumption would have looked like if the conservation program was never implemented). However, the results of our paper show that even when an orange looks very similar to an apple, it still doesn’t work as well as the ideal counterfactual (i.e., comparing two apples that happened to fall randomly out of the same tree)—highlighting the dangers of relying on assumptions that are fundamentally untestable.
This finding is problematic because policy usually isn’t implemented as a randomized experiment. Without carefully designed experiments, it is extremely difficult (but not impossible) to say with confidence that “A causes B” and by how much. The good news in this case is that the randomized social comparison experiment (by making households feel morally responsible to conform with social norms for conserving water) reduced water use by about 5 percent, suggesting the power of non-price initiatives to encourage conservation. Simply providing information on water saving did not have an impact on water use. The bad news, however, is that we would never arrive at that result if we had only oranges to compare to apples. The takeaway? If we want to credibly measure the effectiveness of public policies, when feasible, perhaps we should first roll out pilot programs with randomized control groups to facilitate reliable program evaluations—before widespread implementation.