52 weeks of BetterEvaluation: Week 29: Weighing the data for an overall evaluative judgement
How do you balance the different dimensions of an evaluation?
Is a new school improvement program a success if it does a better job of teaching mathematics but a worse job of language? Is it a success if it works better for most students but leads to a higher rate of school dropout? What if the dropout rate has increased for the most disadvantaged? And what about the costs of the program? Is it a success if the program gets better results but costs more?
Synthesis is one of the aspects of evaluation that is often not done very systematically or well. Most people take time to identify why they're doing an evaluation, what questions they will ask, and what data they will collect and analyse – but few identify how they will synthesise the data.
For an overview of the issues involved in synthesis, and the methods outlined on the BetterEvaluation website, check out this video from our recent AEA Coffee Break webinar:
Synthesise data from one or more evaluations
Why is synthesis so important?
You need a credible process for data synthesis if you’re asking a question such as: Did the program work? Was it effective? Is it better than the alternatives? Just collecting the data and reporting it is not enough.
For example, let’s imagine you’re evaluating an education program, and you’re comparing it to the usual program. And let’s imagine that the methods you’ve chosen to describe and measure the activities and results and context have all been well chosen and well used, and also the methods and designs for causal inference. In short, you’re confident that your results are accurate and can be attributed to the program. Is that enough?
Take a look at the hypothetical results and see if you would have enough information to state whether the new program was better than the old program:
Scenario 1: overall average performance
|Program||Improved student learning|
OK, that was pretty easy.
Scenario 2: performance in different subject areas
|Program||Improved student learning in language||Improved student learning in mathematics|
Hmm. Well, it depends whether you think improvements in language skills are more important than improvements in mathematics.
Scenario 3: positive and negative outcomes
|Program||Improved student learning||Drop out rate|
Scenario 4: different outcomes for different students
|Program||Improved student learning for girls||Improved student learning for boys|
Would we be happy to recommend a program which is actually worse for some groups?
To draw evaluative conclusions, you would probably want to take into account how serious any negative outcomes are, how prevalent they are, and who is experiencing them. In evaluations that look at equity issues, it is extremely important to not just look at the average effect but to look at the effect of the program on the most disadvantaged, such as very poor households, or people with disabilities.
Synthesising data from a single evaluation
On the BetterEvaluation website, you can find information about ten different strategies for synthesis, including guides and examples.
There is information about different processes to use to synthesise evidence taking into different values (what would be considered a success in terms of outcomes, processes and distribution of benefits):
- Consensus conference: a process where a selected group of lay people (non-experts) representing the community are briefed, consider the evidence and prepare a joint finding and recommendation
- Expert panel: a process where a selected group of experts consider the evidence and prepare a joint finding
There is information about different techniques that can be used for the synthesis, including some which include consideration of the resources used to produce the outcomes.
Techniques that include consideration of resources used:
- Cost-benefit analysis: compares costs to benefits, both expressed in monetary units
- Cost-effectiveness analysis: compares costs to the outcomes expressed in terms of a standardized unit (e.g. additional years of schooling)
- Cost-utility analysis: a particular type of cost-effectiveness analysis that expresses benefits in terms of a standard unit such as Quality Adjusted Life Years
- Value for money: a term used in different ways, including as a synonym for cost-effectiveness, and as systematic approach to considering these issues throughout planning and implementation, not only in evaluation.
Other techniques for synthesis:
- Multi-criteria analysis: a systematic process to address multiple criteria and perspectives
- Numeric weighting: developing numeric scales to rate performance against each evaluation criterion and then add them up for a total score.
- Qualitative weight and sum (PDF): using qualitative ratings (such as symbols) to identify performance in terms of essential, important and unimportant criteria
- Rubrics: using a descriptive scale for rating performance that incorporates performance across a number of criteria
There are also approaches which include options for synthesis in an overall "package" of methods:
- Social return on investment: a systematic way of incorporating social, environmental, economic and other values into decision-making processes
Check out this resource we've added:
Synthesising evidence across multiple evaluations
Synthesis is also needed to combine data from across multiple evaluations. Increasingly organisations are looking to learn from multiple evaluations that have been done. These syntheses across evaluations can be a simple literature review or rapid evidence appraisal, or an exhaustive (and exhausting) systematic review using meta-analysis, realist synthesis, or meta-ethnography.
On the BetterEvaluation website, you can find information about different strategies for synthesis, including guides and examples.
There are different ways of doing systematic reviews:
- Best evidence synthesis: a synthesis that, like a realist synthesis, draws on a wide range of evidence (including single case studies) and explores the impact of context, and also builds in an iterative, participatory approach to building and using a knowledge base.
- Meta-analysis: a statistical method for combining numeric evidence from experimental (and sometimes quasi-experimental studies) to produce a weighted average effect size.
- Meta-ethnography: a method for combining data from qualitative evaluation and research, especially ethnographic data, by translating concepts and metaphors across studies.
- Realist synthesis: synthesizing all relevant existing research in order to make evidence-based policy recommendations.
There is an overview page about systematic reviews:
- Systematic review: a synthesis that takes a systematic approach to searching, assessing, extracting and synthesizing evidence from multiple studies. Meta-analysis, meta-ethnography and realist synthesis are different types of systematic review.
And there are less rigorous and quicker ways of finding and summarising evidence from multiple studies:
- Rapid evidence assessment: a process that is faster and less rigorous than a full systematic review but more rigorous than ad hoc searching, it uses a combination of key informant interviews and targeted literature searches to produce a report in a few days or a few weeks.
- Vote counting: comparing the number of positive studies (studies showing benefit) with the number of negative studies (studies showing harm).
Image credit: Old two pan balance, via Flickr
This blog post is part of a series of eight posts covering the BetterEvaluation Framework and presenting the recordings of eight corresponding webinars hosted by the American Evaluation Association. The full series of posts is below.