Realist evaluation

Available languages

Realist evaluation aims to identify the underlying generative causal mechanisms that explain how outcomes were caused and how context influences these.


Realist evaluation focuses on answering the question: “What works, for whom, in what respects, to what extent, in what contexts, and how?”.

Pawson and Tilley (1997), who initially developed the realist evaluation approach, argued that in order to be useful for decision makers, evaluations need to identify ‘what works in which circumstances and for whom?’, rather than simply asking ‘does this intervention work?’ Pawson and Tilley proposed that a realist evaluation should produce ‘context-mechanism-outcome’ (CMO) statements along the lines of “In this context, that particular mechanism fired for these actors, generating those outcomes. In that context, this other mechanism fired, generating these different outcomes.” Others using realist evaluation have incorporated features of the intervention as an additional element to the CMO configurations, to give the formulation C+I+M=O (Punton & Vogel, 2020), or ICAMO -Intervention-Context-Actor-Mechanism-Outcome (Mukumbank et al., 2018).

The realist understanding of how programmes work

Realist philosophy (Pawson and Tilley use the term ‘scientific realism’) considers that an intervention works (or not) because actors make particular decisions in response to the intervention (or not). The ‘reasoning’ of the actors in response to the resources or opportunities provided by the intervention is what causes the outcomes.

Strictly speaking, the term ‘generative mechanism’ refers to the underlying social or psychological drivers that ‘cause’ the reasoning of actors. For example, a parenting skills programme may have achieved different outcomes for fathers and mothers. The mechanism generating different ‘reasoning’ by mothers and fathers may relate to dominant social norms about the roles and responsibilities of mothers and fathers. Additional mechanisms may be situated in psychological, social or other spheres.

Context matters: firstly, it influences ‘reasoning’ and, secondly, generative mechanisms can only work if the circumstances are right. Going back to our example, there may be different social beliefs about the roles and responsibilities of mothers and fathers in different cultures, which may affect how parents respond to the parenting programme. Whether parents can put their new learning into practice will depend on a range of factors – perhaps the time they have available, their own beliefs about parenting, or their mental health. Finally, the context may provide alternative explanations of the observed outcomes, which need to be taken into account during the analysis.

Undertaking a realist evaluation

Developing the initial programme theory

Realist evaluation starts with theory and ends with theory. In other words, the purpose of a realist evaluation is as much to test and refine the programme theory as it is to determine whether and how the programme worked in a particular setting.

The programme theory describes how the intervention is expected to lead to its effects and in which conditions it should do so. The initial programme theory may be based on previous research, knowledge, experience, and the assumptions of the intervention designers about how the intervention will work. The difference between realist and other programme theory-based evaluation approaches is that a realist programme theory specifies what mechanisms will generate the outcomes and what features of the context will affect whether or not those mechanisms operate. Ideally, these elements (mechanisms, outcome, context) are made explicit at the evaluation design stage, as it enables the design of the data collection to focus on testing the different elements of the programme theory.

Choosing the evaluation methods

Realist evaluation is method-neutral (i.e., it does not impose the use of particular methods).

As with any evaluation, the choice of data collection, and analysis methods and tools should be guided by the types of data needed to answer the evaluation questions or, more specifically, to test the initial programme theory in all its dimensions.

Usually, both quantitative and qualitative data are collected in a realist evaluation, often with quantitative data being focused on context and outcomes and qualitative data on generative mechanisms. Because the realist analysis uses mainly intra-programme comparisons (i.e., comparisons across different groups involved in the same programme) to test the initial theory, a realist evaluation design does not need to construct comparison groups. Instead, the refined programme theory will be tested in a different context in a later study. The case study design is often used, whereby case selection is typically purposive, as the cases should enable ‘testing’ of the initial programme theory in all its dimensions.

Using a realist data analysis approach

Realist data analysis is driven by the principles of realism: realist evaluation explains change brought about by an intervention by referring to the actors who act and change (or not) a situation under specific conditions and under the influence of external events (including the intervention itself). The actors and the interventions are considered to be embedded in a social reality that influences how the intervention is implemented and how actors respond to it (or not). The context-mechanism-outcome (CMO) configuration is used as the main structure for realist analysis.  

In the first phase of analysis, data are organised in relation to the initial programme theory – whether the data relate to what was done (the intervention activities) or to context, mechanism, outcome and (groups of) actors. Qualitative data are coded and appropriate methods for analysing quantitative data are applied. The data on outcomes are disaggregated by sub-groups (which were selected based on the programme theory).

Once patterns of outcomes are identified, the mechanisms generating those outcomes can be analysed, provided the right kinds of data are available. The contexts in which particular mechanisms did or did not ‘fire’ can then be determined. Contexts may relate to the sub-groups for whom outcomes were generated and to other stakeholders, implementation processes and organisational, socio-economic, cultural and political conditions.

The analytic process is not necessarily sequential but should result in a set of ‘context-mechanism-outcome’ (CMO) statements: “In this context, that particular mechanism fired for these actors, generating those outcomes. In that context, this other mechanism fired, generating these different outcomes.”

The last phase of the analysis consists of determining which CMO configuration(s) offers the most robust and plausible explanation of the observed pattern of outcomes. This resulting CMO configuration is then compared with the initial programme theory, which is modified (or not) in light of the evaluation findings.

Using the findings from realist evaluation

Both generative mechanisms and programme theories can be considered at different levels of abstraction, from very specific (particular individuals within particular programmes) to quite abstract (across different kinds of programmes). Pawson and Tilley argued that ‘middle range theories’* (MRT) are most useful.  MRTs are specific enough to generate particular propositions to test and general enough to apply across different situations. Typically, MRTs develop over time, based on the accumulation of insights acquired through a series of studies allowing gradual specification of the realist findings. All kinds of theory in realist evaluation – programme theory, theories about particular mechanisms, CMOs, and formal theory– are most useful if developed at a middle level of abstraction.

Because realist evaluation uses the idea of generative causality (i.e. mechanisms only fire when the context is conducive), realists are modest in their claims, stating that an evaluation cannot produce universally applicable findings. At best, evaluation can make sense of the complex processes underlying programmes by formulating plausible explanations ex-post. It can indicate the conditions in which the intervention works (or not) and how they do so. This realistic specification allows decision makers to assess whether interventions that proved successful in one setting may be so in another setting and assists programme planners in adapting interventions to suit specific contexts.

*A middle range theory is understood as a “theory that lies between the minor but necessary working hypotheses …and the all-inclusive systematic efforts to develop a unified theory that will explain all the observed uniformities of social behavior, social organization and social change” (Merton, 1968, p 39).  In essence, “middle range” refers to the degree of abstraction and can refer to programme theory, generative mechanisms, CMOs, formal theories etc.

Features consistent with a causal pathways perspective

A causal pathways perspective on evaluation focuses on understanding how, why, and under what conditions change happens or has happened. It is used to understand the interconnected chains of causal links which lead to a range of outcomes and impacts. These causal pathways are likely to involve multiple actors, contributing factors, events and actions, not only the activities associated with the program, project or policy being evaluated or its stated objectives.

Realist Evaluation can be used in ways which incorporate the following features of a causal pathways perspective:

A causal pathways perspective on evaluation focuses on understanding how, why, and under what conditions change happens or has happened. It is used to understand the interconnected chains of causal links which lead to a range of outcomes and impacts. These causal pathways are likely to involve multiple actors, contributing factors, events and actions, not only the activities associated with the program, project or policy being evaluated or its stated objectives.

Realist Evaluation can be used in ways which incorporate the following features of a causal pathways perspective:

  • Valuing actors’ narratives: Realist evaluation explicitly tests hypothesised theories of change by collecting information from respondents about contexts, mechanisms and/or outcomes and the program theory.
  • Addressing power and inclusion: Realist evaluation has the potential to address power and inclusion by ensuring that the sampling strategy is designed to involve disenfranchised communities and seek information about context and mechanisms leading to change.
  • Articulates detailed and nuanced change pathways: Realist evaluation provides a detailed analysis of the outcomes that result from changes in actors' reasoning in response to an intervention, as well as the underlying generative mechanisms (psychological or social drivers) that influence changes in reasoning.
  • Pays attention to a range of outcomes and impacts: Realist evaluation seeks to explain intended and unintended as well as positive and negative outcomes.
  • Uses an iterative, bricolage approach to evaluation design: The data analysis is iterative, initial program theories are tested, and earlier stages of analysis are used to refine the evaluation design for later stages and/or the program theory.
  • Helps understand contextual variation: Understanding context is central to realist evaluation, which asks, “What works, for whom, in what respects, to what extent, in what contexts, and how?”. Context influences causal mechanisms, and therefore, outcomes can only be achieved if the circumstances are right.
  • Draws on a range of causal inference strategies: Realist evaluations can use many causal inference methods, including process tracing, pattern matching and comparing to predictive models.
  • Taking a complexity-appropriate approach to evaluation quality and rigour: Clear descriptions and justifications of how the analysis has used data to develop, confirm, refute or refine one or more programme theories contribute to credibility and rigour. The findings of realist evaluations about why interventions work ( or don’t work) in specific contexts support decisions about transferring a program to a different setting.


Realist evaluation of the Building Capacity to Use Research Evidence (BCURE) programme

The programme

The Building Capacity to Use Research Evidence (BCURE) programme was a £15.7 million initiative funded by the UK Department for International Development (DFID) from 2013-2017. Six implementing partners worked with governments in 12 low- and middle-income countries in Africa and Asia to strengthen the capacity of civil servants and parliamentarians to use research evidence in decision-making through building skills, incentives, and systems required to access, appraise and apply evidence.

BCURE used a range of interventions designed and combined in different ways – including training, mentoring, policy dialogues, and technical support to develop evidence tools and guidelines. Projects ranged in scope and scale, some based in single ministries and others spanning whole government systems.

The evaluation

DFID commissioned an evaluation to run in parallel with the programme, conducted by an independent team from Itad from 2014 to 2017. The evaluation aimed to understand whether BCURE worked and how and why capacity building can contribute to the increased use of evidence in policymaking in the very different contexts in which the programme operated, informing programming decisions within and beyond DFID.

This emphasis on ‘how and why’ BCURE worked implied the need for an evaluation approach that made use of theory. Realist evaluation was selected over other theory-based approaches because it is particularly useful in pilot programs where it is not clear exactly how and why the intervention will work, where programs are implemented across multiple settings and are likely to lead to different outcomes for different groups in diverse contexts, and where commissioners are interested in learning lessons about how to scale up or roll out an intervention.

How we applied a realist approach

The BCURE evaluation was conceptualised in terms of three broad stages: developing theory, testing theory, and refining theory. These stages were iterative rather than linear, with the theory developed, tested, refined, and tested again as knowledge accumulated.

Developing theory

A draft programme theory was developed at the beginning of the evaluation, drawing on the team’s existing knowledge and professional hunches about the nature of capacity building and how it can contribute to evidence use in policymaking. These insights shaped the research questions for a realist literature review, which identified additional theories and possible mechanisms in the wider literature. These were used to develop a more detailed programme theory and the first iteration of CMO configurations. Features of the intervention were incorporated as an additional element to the CMO configurations to give the formulation C+I+M=O. This helped to separate intervention factors that BCURE controlled (such as training design or length) from contextual factors that it did not (such as trainees' attitudes).

Testing theory

Data was collected in six of the twelve BCURE countries – Bangladesh, Kenya, Pakistan, Sierra Leone, South Africa, and Zimbabwe. The evaluation drew primarily on qualitative data in the form of semi-structured interviews with 567 stakeholders over three years, using tailored topic guides to test the latest set of CIMOs. The evaluation also drew on monitoring data, programme documentation, and, where possible, government documents, such as tools and policies developed in collaboration with BCURE partners.

Refining theory

CIMO configurations were revised through an iterative process. A customised Microsoft Excel database was used to analyse data at a country level, with rows representing the outcomes referenced by interview respondents and columns capturing stakeholders’ descriptions of how and why the outcome did or did not come about, as intervention, context, and mechanism factors. Each year, evidence from the country level was synthesised to explore how and why different interventions contributed to different patterns of outcomes in different contexts. The synthesis involved the whole evaluation team via analysis workshops to identify patterns, concepts, and metaphors that applied across the cases and interrogate differences. This process produced an evidence-based set of refined CIMOs.

Throughout the evaluation, the CIMOs evolved significantly through multiple rounds of testing and refinement. The literature review identified 19 CIMOs, refined to 15 following the first year of data collection. In the second year, some were dropped, others merged, and new ones were added, with 14 CIMOs tested. In the third year, the team realised the need to focus on fewer CIMOs to support a more in-depth investigation. A subset was prioritised, guided by DFID’s interests, and a final set of five tested CIMOs was presented in the final report. See below for an example:


Where technical support is provided to incorporate evidence within a policy process or develop a tool to improve evidence access, appraisal or use (I), this can generate high-quality policies or products (O) that showcase the value of evidence for quality, performance and delivery (M) and lead to adoption (O) and diffusion (O) of the procedure or tool. This is more likely where external actors ‘accompany’ government partners to co-produce policies or tools in a flexible, responsive and collaborative way (I), where policies are high priority or tools address a recognised problem (C), and where tools are intuitive and interactive (I) and genuinely facilitate officials to make decisions and do their jobs better and more efficiently (M).

Use and utility of the evaluation

The realist evaluation of BCURE impacted the policy and practice with and beyond DFID. The approach added value in two main ways:

  • It offered a robust way to analyse causality. Developing and testing realist CIMO configurations added substantial precision and analytical depth to the findings. The iterative development, testing, and refinement of these theories helped ensure they were grounded in the experience of implementing partners and broader insights from the wider literature, which enhanced relevance, empirical weight, and portability to other settings.
  • It helped generate useful, policy-relevant findings that informed follow-up programmes. This was partly because theories were iteratively refined over three years, informed by stakeholder perspectives and pragmatic considerations about how to frame theories in the most operational way. By pinpointing the contexts and intervention factors necessary for mechanisms to ‘fire,’ the evaluation was also able to generate intuitive and accessible lessons for future projects. This was helped by framing mechanisms in everyday labels and carefully considering how to communicate CIMOs in the final report to minimise realist jargon (discussed further in this practice paper).

All evaluation reports and other outputs are available on the Itad website.

Advice for CHOOSING this approach (tips and traps)

A realist evaluation design is well suited to assess how interventions in complex situations work because it allows the evaluator to deconstruct the causal web of conditions underlying such interventions.

A realist evaluation yields information that indicates how the intervention works (i.e., generative mechanism) and the conditions that are needed for a particular mechanism to work (i.e., specification of contexts) and, thus, it is likely to be more useful to policymakers than other types of evaluation.

As with any evaluation, the scope of the realist evaluation needs to be set within the boundaries of available time and resources. Using a realist approach to evaluation is not necessarily more resource or time-intensive than other theory-based evaluations, but it can be more expensive than a simple pre-post evaluation design.

It is generally not possible to explore every aspect of an intervention, especially a large and complex one, using a realist approach. Choices must be made about which parts of an intervention/programme theory to focus on. The needs and interests of the commissioner should guide these.

Advice for USING this approach (tips and traps)

Larger scale or more complicated realist evaluations are ideally carried out by interdisciplinary teams as this usually allows for a braoder consideration of likely mechanisms. However, it is possible to undertake realist evalaution with single practitioners, and in small-scale evaluations.

If the programme theory/MRT is made explicit together with the main actors, it can lead to a better, shared understanding of the intervention. This in turn could improve ownership and lead to more context-appropriate interventions.

Developing the causal theory may also contribute to a better definition of what needs to be evaluated and, thus, what the key evaluation questions are.

Allow sufficient time for assessing the interactions between intervention, actors and context.


The following books are recommended reading on the philosophical background and the general development of this approach:


Discussion Papers


Examples of realist studies in health care

Merton, R.K. (1968). Social theory and social structure. New York: The Free Press

Mukumbang, F.C., Marchal, B., Van Belle, S. & van Wyk, B. (2018). Unearthing how, why, for whom and under what health system conditions the antiretroviral treatment adherence club intervention in South Africa works: A realist theory refining approach. BMC Health Serv Res 18, 343.

Punton, M. & Vogel, I. (2020). Keeping it real: using mechanisms to promote use in the realist evaluation of the building capacity to use research evidence program. New Directions for Evaluation 2020(167), 87-100.

Image source: Mobius Transform of a regular circle packing, by fdecomite on Flickr

Last updated:

Expand to view all resources related to 'Realist evaluation'

'Realist evaluation' is referenced in: