52 weeks of BetterEvaluation: Week 29: Weighing the data for an overall evaluative judgement

How do you balance the different dimensions of an evaluation?

Is a new school improvement program a success if it does a better job of teaching mathematics but a worse job of language? Is it a success if it works better for most students but leads to a higher rate of school dropout? What if the dropout rate has increased for the most disadvantaged? And what about the costs of the program? Is it a success if the program gets better results but costs more?

Synthesis is one of the aspects of evaluation that is often not done very systematically or well. Most people take time to identify why they're doing an evaluation, what questions they will ask, and what data they will collect and analyse – but few identify how they will synthesise the data.

For an overview of the issues involved in synthesis, and the methods outlined on the BetterEvaluation website, check out this video from our recent AEA Coffee Break webinar:

Synthesise data from one or more evaluations

Why is synthesis so important?

You need a credible process for data synthesis if you’re asking a question such as: Did the program work? Was it effective? Is it better than the alternatives? Just collecting the data and reporting it is not enough.

For example, let’s imagine you’re evaluating an education program, and you’re comparing it to the usual program. And let’s imagine that the methods you’ve chosen to describe and measure the activities and results and context have all been well chosen and well used, and also the methods and designs for causal inference. In short, you’re confident that your results are accurate and can be attributed to the program. Is that enough?

Take a look at the hypothetical results and see if you would have enough information to state whether the new program was better than the old program:

Scenario 1: overall average performance

Program	Improved student learning
New program	20%
Old program	10%

OK, that was pretty easy.

Scenario 2: performance in different subject areas

Program	Improved student learning in language	Improved student learning in mathematics
New program	5%	35%
Old program	10%	10%

Hmm. Well, it depends whether you think improvements in language skills are more important than improvements in mathematics.

Scenario 3: positive and negative outcomes

Program	Improved student learning	Drop out rate
New program	20%	30%
Old program	10%	5%

Scenario 4: different outcomes for different students

Program	Improved student learning for girls	Improved student learning for boys
New program	5%	35%
Old program	10%	5%

Would we be happy to recommend a program which is actually worse for some groups?

To draw evaluative conclusions, you would probably want to take into account how serious any negative outcomes are, how prevalent they are, and who is experiencing them. In evaluations that look at equity issues, it is extremely important to not just look at the average effect but to look at the effect of the program on the most disadvantaged, such as very poor households, or people with disabilities.

Synthesising data from a single evaluation

On the BetterEvaluation website, you can find information about ten different strategies for synthesis, including guides and examples.

There is information about different processes to use to synthesise evidence taking into different values (what would be considered a success in terms of outcomes, processes and distribution of benefits):

Consensus conference: a process where a selected group of lay people (non-experts) representing the community are briefed, consider the evidence and prepare a joint finding and recommendation
Expert panel: a process where a selected group of experts consider the evidence and prepare a joint finding

There is information about different techniques that can be used for the synthesis, including some which include consideration of the resources used to produce the outcomes.

Techniques that include consideration of resources used:

Cost-benefit analysis: compares costs to benefits, both expressed in monetary units
Cost-effectiveness analysis: compares costs to the outcomes expressed in terms of a standardized unit (e.g. additional years of schooling)
Cost-utility analysis: a particular type of cost-effectiveness analysis that expresses benefits in terms of a standard unit such as Quality Adjusted Life Years
Value for money: a term used in different ways, including as a synonym for cost-effectiveness, and as systematic approach to considering these issues throughout planning and implementation, not only in evaluation.

Other techniques for synthesis:

Multi-criteria analysis: a systematic process to address multiple criteria and perspectives
Numeric weighting: developing numeric scales to rate performance against each evaluation criterion and then add them up for a total score.
Qualitative weight and sum (PDF): using qualitative ratings (such as symbols) to identify performance in terms of essential, important and unimportant criteria
Rubrics: using a descriptive scale for rating performance that incorporates performance across a number of criteria

There are also approaches which include options for synthesis in an overall "package" of methods:

Social return on investment: a systematic way of incorporating social, environmental, economic and other values into decision-making processes

Check out this resource we've added:

ACFID and value for money - Discussion paper

The Australian Council for International Development (ACFID) created this discussion paper to generate meaningful debate and help their sector work with AusAID to define value for money better.

Synthesising evidence across multiple evaluations

Synthesis is also needed to combine data from across multiple evaluations. Increasingly organisations are looking to learn from multiple evaluations that have been done. These syntheses across evaluations can be a simple literature review or rapid evidence appraisal, or an exhaustive (and exhausting) systematic review using meta-analysis, realist synthesis, or meta-ethnography.

On the BetterEvaluation website, you can find information about different strategies for synthesis, including guides and examples.

There are different ways of doing systematic reviews:

Best evidence synthesis: a synthesis that, like a realist synthesis, draws on a wide range of evidence (including single case studies) and explores the impact of context, and also builds in an iterative, participatory approach to building and using a knowledge base.
Meta-analysis: a statistical method for combining numeric evidence from experimental (and sometimes quasi-experimental studies) to produce a weighted average effect size.
Meta-ethnography: a method for combining data from qualitative evaluation and research, especially ethnographic data, by translating concepts and metaphors across studies.
Realist synthesis: synthesizing all relevant existing research in order to make evidence-based policy recommendations.

There is an overview page about systematic reviews:

Systematic review: a synthesis that takes a systematic approach to searching, assessing, extracting and synthesizing evidence from multiple studies. Meta-analysis, meta-ethnography and realist synthesis are different types of systematic review.

And there are less rigorous and quicker ways of finding and summarising evidence from multiple studies:

Rapid evidence assessment: a process that is faster and less rigorous than a full systematic review but more rigorous than ad hoc searching, it uses a combination of key informant interviews and targeted literature searches to produce a report in a few days or a few weeks.
Vote counting: comparing the number of positive studies (studies showing benefit) with the number of negative studies (studies showing harm).