Week 29: Evaluation design and unintended consequences or, from firefighting to systematic action


This week’s blog is from Jonny Morell, editor of Evaluation and Program Planning and author of Evaluation in the Face of Uncertainty: Anticipating Surprise and Responding to the Inevitable. He blogs at http://evaluationuncertainty.com/.

Jonny examines the issue of unintended consequences which often take evaluators by surprise, and suggests a way to plan for the unexpected.

Why are unexpected outcomes problematic for evaluation?

I’m interested in how to apply strong evaluation designs to programs that yield unexpected outcomes. My problem is that to work well, many designs have to maintain their integrity over time. So what happens when a design is optimized for one expected set of outcomes, and others pop up instead, or in addition?

Here is an example I made up to illustrate what I mean. 

Example of community development and unexpected outcomes

A novel program to develop linkages among civil society organizations proceeds along two lines. First, training is given to any and all organizations that may wish to send people to a few sessions. Second, two pairs of critical organizations are singled out for intensive assistance. These are: 1) the health and mental health systems, and  2) better nutrition advocates and the elementary schools. This is certainly not an unreasonable way to proceed.

A likely evaluation design

What is the program theory? That certain connections matter, and by implication, other connections do not. What outcomes will occur if the important connections are enriched? In the short run specific outcomes will result, as determined by a working group of stakeholders, and expressed in a logic model. For instance, continuity of care will improve for people being treated for both health and mental health conditions. In the long run there may be change at the community level, e.g. resident satisfaction with government or social services.

What is the evaluation design? 1) Measure the short term outcomes at baseline, and then every year for three years. Put some effort and resources into making sure those are valid measures. 2)  Interview people once a year to see if they think the community is changing. 3) Pick a few neighboring communities as control groups. Measure the short term outcomes in those control communities. No assessment of training required. This is not such a bad design. It does a thorough job of evaluating outcomes that may occur during the timeframe of the program. It focuses on what people think is important. It provides hints at other outcomes. There is a nice time series. It has multiple comparison groups.

An alternate scenario

A design like the above is what I would do. Or at least, it is what I would have done before I started reading about adaptive co-evolutionary networks. Then I started thinking about new program theory and new evaluation design.

Imagine that the training brought diverse groups together and that linkages began to form, i.e. network connections were established and strengthened. It may be the case that real community impact is a function of those new network connections, regardless of which specific connections are developed. And to make matters more interesting, it may well be that change at the community level takes place suddenly. Lots and lots of connections develop. Nothing happens. Then poof! There is a state change and community level indictors jump from low to high in a short period of time. I have no idea if something like this would happen, but based on what I know about network behavior, it’s a somewhat reasonable expectation.

So what happened to my original evaluation design? My measurement system was optimized to assess specific changes from specific connections. Oops. What really mattered were community level changes. Sure my design nods in the direction of those changes, but my money, logistics, measurement system and intellectual capital all focus on outcomes conjured through logic models that were developed by members of the health, mental health, school, and child nutrition stakeholders.

I got it backwards. Important community level changes emerged from an unspecifiable tangle of connections. I have no data on which network connections developed, or how strong they were, or what directionality they had, or the typology of the network. I have no data on how the network connections affected the nodes, which in turn influenced the shape of the network. I have no data on when the change happened (and thus the state of the network when the change happened) because I’m only collecting data once a year.

Don’t get me wrong. I’m not claiming that these network effects would happen, or that nurturing specific relationships is unimportant. I’m only saying that it’s not crazy to think they might, and if they did, all those community level changes would not be evaluated very well.

This is a classic problem of unintended consequences. We optimized a program theory, a design, and a measurement system that missed the important outcomes. When those outcomes popped up, the evaluation we had left could provide weak knowledge at best. I don’t know anyone who has been doing evaluation for any length of time who hasn’t been hit over the head with a problem that had some elements in my example. What do we do when it happens? We are clever folk and we always manage to find some way out of the corner we painted ourselves into. This is crisis management. We can do better. We can shift from crisis management to systematic planning for surprise. 

The way out

I have studied quite a few case studies of evaluation surprise. That led me to a set of beliefs that are summarized in the picture. I begin with the idea that there is a continuum. At one end are program outcomes that might have been reasonably anticipated. At the other are outcomes that are truly unpredictable because they result from the behavior of complex systems. In the middle are outcomes that are less and less foreseeable. Different tactics are useful along different parts of the continuum. (There are all kinds of reasons why a continuum like this does not exist, but the fiction works, so I use it.)

Foreseeable end Middle of the spectrum Unpredictable end
Use theory and experience to anticipate what happened.
  • Look carefully at past research and the experience of others. This seems obvious, but it’s amazing how often it is not done.
Possible to detect incipient changes that may affect outcomes – the longer the lead time between detection and occurrence, the greater the chance of being able to choose from a wide variety of evaluation methods.
  • Use forecasting, monitoring and system-based logic modelling on a deliberately scheduled basis.
When all else fails,
  • Pretend the program theory fits what the data can test – this sometimes leads to very useful evaluation information.
  • Choose methods that are flexible but be aware of the trade offs), e.g. use a well-validated scale or structured interview?


I have a presentation that goes through this picture in detail.

Nothing I’m suggesting is a new methodology. What I am suggesting is that we need to think of these choices systematically with a focus on implications for evaluating unexpected consequences. There is no silver bullet but we can chip away at the problem, and we will be much better off for it.

Related content