Week 29: Evaluation design and unintended consequences or, from firefighting to systematic action

Jonny Morell's picture 14th July 2014 by Jonny Morell

This week’s blog is from Jonny Morell, editor of Evaluation and Program Planning and author of Evaluation in the Face of Uncertainty: Anticipating Surprise and Responding to the Inevitable. He blogs at http://evaluationuncertainty.com/.Jonny examines the issue of unintended consequences which often take evaluators by surprise, and suggests a way to plan for the unexpected.

Why are unexpected outcomes problematic for evaluation?

I’m interested in how to apply strong evaluation designs to programs that yield unexpected outcomes. My problem is that to work well, many designs have to maintain their integrity over time. So what happens when a design is optimized for one expected set of outcomes, and others pop up instead, or in addition?

Here is an example I made up to illustrate what I mean. 

Example of community development and unexpected outcomes

A novel program to develop linkages among civil society organizations proceeds along two lines. First, training is given to any and all organizations that may wish to send people to a few sessions. Second, two pairs of critical organizations are singled out for intensive assistance. These are: 1) the health and mental health systems, and  2) better nutrition advocates and the elementary schools. This is certainly not an unreasonable way to proceed.

A likely evaluation design

What is the program theory? That certain connections matter, and by implication, other connections do not. What outcomes will occur if the important connections are enriched? In the short run specific outcomes will result, as determined by a working group of stakeholders, and expressed in a logic model. For instance, continuity of care will improve for people being treated for both health and mental health conditions. In the long run there may be change at the community level, e.g. resident satisfaction with government or social services.

What is the evaluation design? 1) Measure the short term outcomes at baseline, and then every year for three years. Put some effort and resources into making sure those are valid measures. 2)  Interview people once a year to see if they think the community is changing. 3) Pick a few neighboring communities as control groups. Measure the short term outcomes in those control communities. No assessment of training required. This is not such a bad design. It does a thorough job of evaluating outcomes that may occur during the timeframe of the program. It focuses on what people think is important. It provides hints at other outcomes. There is a nice time series. It has multiple comparison groups.

An alternate scenario

A design like the above is what I would do. Or at least, it is what I would have done before I started reading about adaptive co-evolutionary networks. Then I started thinking about new program theory and new evaluation design.

Imagine that the training brought diverse groups together and that linkages began to form, i.e. network connections were established and strengthened. It may be the case that real community impact is a function of those new network connections, regardless of which specific connections are developed. And to make matters more interesting, it may well be that change at the community level takes place suddenly. Lots and lots of connections develop. Nothing happens. The poof! There is a state change and community level indictors jump from low to high in a short period of time. I have no idea if something like this would happen, but based on what I know about network behavior, it’s a somewhat reasonable expectation.

So what happened to my original evaluation design? My measurement system was optimized to assess specific changes from specific connections. Oops. What really mattered were community level changes. Sure my design nods in the direction of those changes, but my money, logistics, measurement system and intellectual capital all focus on outcomes conjured through logic models that were developed by members of the health, mental health, school, and child nutrition stakeholders.

I got it backwards. Important community level changes emerged from an unspecifiable tangle of connections. I have no data on which network connections developed, or how strong they were, or what directionality they had, or the typology of the network. I have no data on how the network connections affected the nodes, which in turn influenced the shape of the network. I have no data on when the change happened (and thus the state of the network when the change happened) because I’m only collecting data once a year.

Don’t get me wrong. I’m not claiming that these network effects would happen, or that nurturing specific relationships is unimportant. I’m only saying that it’s not crazy to think they might, and if they did, all those community level changes would not be evaluated very well.

This is a classic problem of unintended consequences. We optimized a program theory, a design, and a measurement system that missed the important outcomes. When those outcomes popped up, the evaluation we had left could provide weak knowledge at best. I don’t know anyone who has been doing evaluation for any length of time who hasn’t been hit over the head with a problem that had some elements in my example. What do we do when it happens? We are clever folk and we always manage to find some way out of the corner we painted ourselves into. This is crisis management. We can do better. We can shift from crisis management to systematic planning for surprise. 

The way out

I have studied quite a few case studies of evaluation surprise. That led me to a set of beliefs that are summarized in the picture. I begin with the idea that there is a continuum. At one end are program outcomes that might have been reasonably anticipated. At the other are outcomes that are truly unpredictable because they result from the behavior of complex systems. In the middle are outcomes that are less and less foreseeable. Different tactics are useful along different parts of the continuum. (There are all kinds of reasons why a continuum like this does not exist, but the fiction works, so I use it.)

Foreseeable end Middle of the spectrum Unpredictable end
Use theory and experience to anticipate what happened.
  • Look carefully at past research and the experience of others. This seems obvious, but it’s amazing how often it is not done.
Possible to detect incipient changes that may affect outcomes – the longer the lead time between detection and occurrence, the greater the chance of being able to choose from a wide variety of evaluation methods.
  • Use forecasting, monitoring and system-based logic modelling on a deliberately scheduled basis.
When all else fails,
  • Pretend the program theory fits what the data can test – this sometimes leads to very useful evaluation information.
  • Choose methods that are flexible but be aware of the trade offs), e.g. use a well-validated scale or structured interview?

I have a You Tube video that goes through this picture in detail.

Nothing I’m suggesting is a new methodology. What I am suggesting is that we need to think of these choices systematically with a focus on implications for evaluating unexpected consequences. There is no silver bullet but we can chip away at the problem, and we will be much better off for it.

Read more

You'll find some options for identifying potential unintended outcomes at Identify Potential Unintended Results in the Rainbow Framework.

A special thanks to this page's contributors
Director of Evaluation, Fulcrum Corporation and Editor of Evaluation and Program Planning.
Ann Arbor, United States of America.


Bojan Radej's picture
Bojan Radej

Unintended consequences are usually treated unsystematically and ad hoc in evaluation (firefighting). Unsystematic approach is very dangerous and usually results in even more complicated and messy evaluation. This should be avoided, so we somehow need theoretically supported thinking about how to include unintended impacts into evaluation. Term »unintended« means beyond somebody’s direct control so we may better distinguish between primary and secondary impacts – targeted with the goal of the concerned agent and non-targeted which relate to impacts on goals of all others (such as to legitimate stakeholders), except on the concerned agent’s goal. Evaluation of primary and secondary impacts can be accomplished integrally with ‘mesoscopic evaluation approach’ (evaluation theory) – for example, here on the case of evaluation synthesis (between intended and unintended impacts of evaluated program). In the proposed approach all evaluation findings (such as obtained with the Leopold impact matrix) are the first expressed on the same range of values (such as 1-5); then they are grouped into three (to four) evaluation domains, depending on the context. These are further correlated to obtain mesoscopic synthesised results, which are emergent and so provide deeper insight into evaluated object – in this way increasing effectiveness of evaluation (evaluation design). The obtained results are of hybrid nature which also attributes to them capacity for bridging oppositions in evaluation and also resolving conflicts in public domain in overlapping way. In this way evaluation of secondary impacts is not only “important” in evaluations, it is actually essential for very possibility of integrative and holistic evaluation of complex public interventions which are characterised for their intensive unintended impacts. See also here and here (backgrounds). Video here.

Jonny Morell's picture
Jonathan A. Morell

The posts above have sparked quite a few thoughts in my head. In no particular order, here they are.

NEGATIVE CONSEQUENCES: I completely agree about needing to watch out for negative outcomes. In fact the darker side of me believes most unintended outcomes will be negative. I can justify this opinion in terms of some of my thinking about programs as organisms competing on fitness landscapes, but I have to admit that I usually loose these arguments. (I’m still right though, of course.)

Another issue is that the people we work for usually don’t like to hear about the negative consequences of what they have done. There is strong pressure on us (well on me at least), to shy away from bad news.

MEETING PROGRAM GOALS: it’s easy enough to find cases where programs have done good things that were not directly related to program goals. One would think funders would be happy about this, but I’m not so sure. If I give money to improve road infrastructure, and the investment also paid off in terms of girls’ education, would I care? I live in the Ministry of Infrastructure. I am embedded in its culture, politics, funding sources, internal process, human capital and physical location. It’s all I can do to get funding to get that road built. The fact that I managed it ranks me as being phenomenally successful at bringing about a social good. Most other people would fail at it. Now I need to leverage my success into getting more roads built. As a person I’m glad about those schools. But as a political/social entity, those schools matter not a whit to me. I am not saying this is a bad thing. I can muster arguments in favor. All I want is to make the point that there are some juicy issues in thinking about program goals.

This raises another problem. If I stumble across an unintended effect, can I do a good job of evaluating it? Sometimes yes, but sometimes no because I have not put the proper evaluation mechanism into place. What if getting a good understanding of how the road affected education required a qualitative time series design  involving interviewing kids and their parents? That possibility is gone, along with what would have been a powerful evaluation design.

DISCERNING POTENTIAL CONSEQUENCES: Sometimes it is true that knowing assumptions and beliefs will shed light on consequences. So too will a good literature review, attention to social science theory, and the opinion of a diverse set of experts. But this is not always the case. A foundation of my book on this topic is the idea of a continuum between “what might have reasonably been foreseen”, to that which is “impossible to anticipate because events emanate from the behavior of complex systems”. (A continuum like this is actually impossible, but it’s a useful fiction.) Different tactics are needed along the continuum. “Good ongoing monitoring and/or an interim evaluation” is critical because it can increase the time an evaluator has to retool the evaluation to meet the new circumstances.

FIRE FIGHTING: “Unintended consequences are usually treated unsystematically and ad hoc in evaluation (firefighting).” Too true. One of my major goals in writing all this s tuff is to help move the field from firefighting to systematic action. As I am fond of saying, “unintended does not mean random.” There are very deliberate actions one can take to address these issues.

INCOMMENSURABLE GOALS: I have mixed feelings about this. Let’s take my previous example. I have built a road that resulted in improved travel and better education. I know by how much in each case. From a policy point of view, does it do any good to collapse those improvement metrics into a single number? Don’t the Infrastructure and Education ministries need their own estimates? On the other hand, we do (or at least we should) care about “total impact”. Sure, the organizational structure of government and the politics of interest groups want to ignore “total impact”. But that is reason enough to include “total impact” into the public discourse. My view is that it is a complicated world, and good decision making and good understanding of events requires multiple metrics.

As for how to aggregate the outcomes. “grouped into three (to four) evaluation domains, depending on the context” sounds like a concept mapping problem to me, but I’m sure there are other ways to do it. I plan to do some reading on Bojan’s references.

As for “expressed on the same range of values (such as 1-5)”, there is a formal mathematical way to do things like this called “Analytical Hierarchy Process”. I have not used it yet but I have been reading about it and I’m going to apply it in some upcoming work on R&D needs assessment. If any of you have used AHP in an evaluation setting please get in touch with me. I’ll take any advice I can get.

Bojan Radej's picture
Bojan Radej

<p>&ldquo;IN FACT THE DARKER SIDE OF ME BELIEVES MOST UNINTENDED OUTCOMES WILL BE NEGATIVE&rdquo;. Unintended impacts are unintended only from one specific and narrow point of view; policy impact evaluation usually aims to go beyond specific views and incorporate broader perspective; thus for majority of population majority of policy impacts are unintended impacts, so these are of central importance in evaluation of wide social policy impacts; I agree that majority of unintended impacts in evaluation practise are negative because these are most fiercely brought forward by those most negatively affected. So positive unintended impacts usually remain hidden from evaluator. Nevertheless, they (positive and negative unintended impacts) are enormously important in aggregate as an indicator of policy synergy.</p>

<p>&ldquo;THERE IS STRONG PRESSURE ON US (WELL ON ME AT LEAST), TO SHY AWAY FROM BAD NEWS.&rdquo; A client needs to be explained that synergetic interventions are in his/her own basic interest &ndash; since it enhances long-term efficient and it contributes to strengthening his/her strategic position;</p>

<p>&ldquo;IF I GIVE MONEY TO IMPROVE ROAD INFRASTRUCTURE, AND THE INVESTMENT ALSO PAID OFF IN TERMS OF GIRLS&rsquo; EDUCATION, WOULD I CARE?&rdquo; Excellent point! Synergies are not important for sector based policy-making and for narrow minded bureaucrats in ministries. This is exactly why many well intended programs financed by public money so flatly fail &ndash; they wish to invest into public goods as fragmented puzzle while citizens perceive their achievements integrally, how policy improves their wellbeing. Bureaucrat should be aware that synergies work in all directions &ndash; just imagine his/her joy if education of girls contributed somehow to improved mobility without need to spend a cent from a budget of ministry for transportation. If she/he manages to achieve that other ministries invest in projects that improve mobility she/he is not only successful but also terribly smart. This is also the approach taken by EU commission in its new programming period (smart growth) &ndash; just another reason why bureaucrats should pay much more attention on unintended policy impacts.</p>

<p>&ldquo;WHAT IF GETTING A GOOD UNDERSTANDING OF HOW THE ROAD AFFECTED EDUCATION REQUIRED A QUALITATIVE TIME SERIES DESIGN &nbsp;INVOLVING INTERVIEWING KIDS AND THEIR PARENTS?&rdquo; This is technical concern which will not impact strategic decision to evaluate unintended impacts or not; I think that data availability is smaller problem &ndash; larger challenge is to choose appropriate methodological approach that allows evaluation with existing set of data. Not analysis of data but their synthesis is decisive for appropriate evaluation of unintended impacts. In this regard Scriven has discussed aggregation problem in evaluation.</p>

<p>&ldquo;TO MOVE THE FIELD FROM FIREFIGHTING TO SYSTEMATIC ACTION&rdquo;. Fully agree; in my opinion, reluctance to evaluate unintended policy impacts is much more linked to old way of thinking (in evaluation and policy making) than to lack of appropriate evaluation methodologies (complex theory, aggregative).</p>

<p>&ldquo;TOTAL IMPACT&rdquo;. Multiple metrics (about incommensurable goals) is certainly needed (for impact evaluation of complex policy situations) but this does not mean that evaluator cannot obtain some sort&nbsp; of total impact indicator at the end of evaluation &ndash; of course, this will not be compiled as a singular (compound) value but probably as a correlated set of partial results. Also wish to distinguish methodologically between complicated and complex situations. Complicated situations call for systemic approach, while incommensurability calls for complex approach to evaluation (according to the &ldquo;Cynefin&rdquo; framework).</p>

<p>&ldquo;MAPPING, HOW TO AGGREGATE THE OUTCOMES&ldquo;. Mapping is needed only if evaluator is not given the groupings already in the TOR &ndash; but this is usually given, say for Sustainable development evaluator evaluates Economic, Social, Environmental impacts or for Territorial cohesion (Spatial, physical and economic subsystem&rsquo;s impacts)) or in the triangle of knowledge (Education &ndash; Research &ndash; Knowledge impacts).</p>

<p>&ldquo;ANALYTICAL HIERARCHY PROCESS&rdquo;. Many thanks for reminding me of AHP &ndash; this is new methodology for me as well.</p>

Add new comment

Login Login and comment as BetterEvaluation member or simply fill out the fields below.