Principle-led planning for analysis with artificial intelligence (AI)

A robot hand and a human hand each holding a jigsaw puzzle piece reach toward each other

Are you thinking about using Generative AI (Gen AI) to analyse your qualitative data? A principle-led analysis plan may help you navigate this process and make decisions that are both ethically sound and practical for your analysis task.

Generative AI (GenAI) is a type of artificial intelligence that creates new content—including text, images, audio, video, and code—in response to user queries by identifying and learning from patterns in data.

There are many ethical and practical considerations related to incorporating GenAI into evaluation practice. Environmental impacts, privacy concerns, accuracy issues, and Western/white cultural bias are just a few of the challenges associated with AI (as discussed in the Institute of Development Studies (IDS)’s Ten Reasons Not to Use AI for Development, the Canadian Centre for Cyber Security’s report on Generative AI, and the UK government's discussion paper on Frontier AI risks, to name a few). At the same time, market forces are likely to pressure many organisations to adopt AI before these problems have been fully understood and addressed. We’ll all be learning-while-doing for the foreseeable future.

AI applications are under development for tasks across the entire evaluation process from data collection to report writing. This document focuses on analysis, specifically analysis of qualitative data using Generative AI. A principle-led analysis plan may help you navigate the adoption of AI and make decisions that are both ethically sound and practical for your analysis task.

Why is analysis planning important when integrating AI into evaluation and research?

Generative AI will not magically transform raw data into useful findings. In fact, to produce accurate results, GenAI requires careful guidance by humans who are skilled in analysis. Evaluators and researchers seeking to strengthen analysis with this new technology must invest more – not less – expertise, time, and attention. Pioneering practitioners describe deliberate and iterative analytical processes including strategic segmentation of the data, testing and verification of interim results, cycles of inquiry and refinement of research questions, and careful interpretation. Each of these steps involves human judgement.

Unfortunately, GenAI is arriving on the scene at a time when careful analysis of qualitative data seems to be less valued by commissioners and, for a variety of reasons, declining in evaluation practice outside of academia. Too often, analysis is under-resourced in evaluation scopes of work, plans, and budgets. These days, teams that I coach often lack sufficient training and experience for the job at hand.

When integrating GenAI into analysis, reflective processes are critical for helping evaluators make methodologically and ethically sound decisions. The process of drafting an analysis plan encourages evaluators to thoughtfully consider analysis choices and to document the decisions and their rationale. Ideally, planning should involve collegial discussions among those participating in the evaluation and/or affected by its findings. And plans should be updated if/when analysis evolves in unexpected ways.

How can principles help with planning AI-assisted analysis?

Most of us are in uncharted territory when it comes to using Generative AI in evaluation. One of the challenges is that some of our common analysis procedures are not a good fit when used with GenAI models. As we trial and test new analysis approaches, we are faced with many decisions, such as:

  • How much data is needed to meet the evaluation purpose?
  • Should we analyse the whole data set at once? Or should we segment the data for analysis?
  • How can we ensure accuracy in AI-assisted analysis?
  • Who should participate in analysis, interpretation and sensemaking?
  • At what point in the evaluation process and for which analysis tasks should we apply AI?
  • How do we protect privacy?
  • Which AI tools have been developed with ethical, people-first practices?

To answer questions like these, we need decision-making processes that are both flexible and consistent. That’s where principles can help. Principles provide context-sensitive guidance about what to do or not do. Principles occupy the sweet spot between values (not specific enough about action) and rules (not flexible enough to apply across diverse situations). Using principles (PDF), evaluators working on different evaluations, or across diverse project contexts, and analyzing data with different characteristics, can make analysis choices that are consistent.

How to go about drafting a principle-led analysis plan?

Guiding principles may be drawn from different sources. The OECD’s Principles on AI and G20’s Principles for responsible stewardship of trustworthy AI, both adopted in 2019, are the first intergovernmental standards on AI. Relevant examples from the international non-profit sector can be found in the Global Initiative for Digital Assurance (GIDA) report and the Principles for Digital Development.

I suggest considering principles that undergird social programs (e.g. Do No Harm) and evaluation practice. Professional evaluation associations are a good starting point, (e.g. AEA’s Guiding Principles for Evaluators), as are international bodies (e.g. UNEG’s Ethical Guidelines). Many funders have established principles, as do organisations representing local communities or indigenous peoples. The various organisations associated with the project being evaluated may also have agreed upon values or ways of working. In Outcome Harvesting, there are nine principles that guide the application of that specific approach.

Drawing from widely recognised principles can provide a strong foundation for an evaluation, but principles should also be informed by and responsive to the specifics of your context—they do not always have to come from an external source.

Whatever sources you start with, select principles that are directly relevant to the evaluation at hand and will provide practical guidance for your analysis decisions, that is, they will help you answer questions such as those outlined above and detailed below. In a recent AI-assisted analysis pilot I conducted with Steve and Gabriele at Causal Map, we selected three principles to steer our analysis choices:

  1. Prioritise local leadership in the evaluation
  2. Protect the integrity of the Outcome Harvesting approach
  3. Produce accurate AI results to provide actionable answers to harvest questions

Start a principle-led analysis plan early in the evaluation process to make thoughtful choices about where and how to use AI-assisted analysis. At its core, a principle-led analysis plan should document:

  1. The principles that will steer the analysis
  2. An overview of the analysis process
  3. The specific analysis decisions and the rationale behind them

The overview of the analysis process should start with the steps as they are understood at the time. As analysis unfolds, new steps may be added as the need arises. For example, some preliminary results required further exploration or verification.

The analysis plan is likely to be a part of a comprehensive evaluation plan. You may find it useful to begin the analysis plan with a brief restatement of the key points from the evaluation plan that are foundational to the analysis. Often this information is summarised in an evaluation matrix.

  • Evaluation purpose and questions: A short summary of the evaluation’s purpose and the key questions that it is intended to answer
  • Evaluation design or main approach
  • The type and amount of data, as well as any special characteristics

What analysis choices should the plan cover?

I’ve outlined a few of the analysis choices that are before us when we begin to conduct analysis. The plan can help make these choices explicit and our principles can help ensure that our decisions are intentional. A principle-led analysis plan is intended to help you generate accurate results that answer the evaluation questions within the available time and resources, complementing, rather than competing with, the evaluation’s methodology and primary approach or methods. Planning starts with “How can GenAI add value to this evaluation?” rather than “What can AI do?”

How much data? Use principles to decide how much data is needed to meet the evaluation purpose

Arguments in favor of Generative AI tend to focus on dealing with larger and larger amounts of data. But use of this technology should not be an incentive to collect more data than is necessary to answer the evaluation’s questions and meet the evaluation’s purpose. The Outcome Harvesting principle “less is more” emphasises learning and reminds evaluators not to collect more data than the evaluation can analyse and use with available time, resources or capacity. Evaluators may find it useful to adopt a principle of 'good resource stewardship' to guide choices about how to allocate resources across the evaluation process (e.g. between data collection and analysis).

We don’t need billions of data points for GenAI to add value to analysis in evaluation. Consider the analysis challenges that face the evaluation team and weigh the data set in relation to the resources available to produce useful findings. That’s a more useful measurement of size. Evaluation teams facing time and resource constraints may find it challenging to analyse even a relatively small data set. When making choices about how to integrate AI into analysis, start with “What analysis challenges or capacity gaps am I trying to address?” Assess the analytical task and the team’s capacity, then plan to apply GenAI to address those specific analytical challenges or capacity and resource limitations.

What data? Segment data to produce coherent answers to each question and ensure accuracy

One basic question to answer in an analysis plan is whether to analyse the full data set at once or subdivide it. Here the considerations include:

  1. Providing meaningful answers to the evaluation questions
  2. Error-checking AI results

Evaluators should revisit the evaluation questions and ask: What analysis steps will produce clear, actionable answers for evaluation users? This may involve drafting sub-questions and structuring analysis to answer each one in order to answer a main evaluation question. In our GenAI-assisted analysis of Outcome Harvesting data, we chose to segment and analyze the data by outcome area. We reasoned that analysis by outcome area would be more likely to produce meaningful answers about the causal pathways contributing to outcomes and provide actionable findings for decision-makers because outcome areas are often aligned with project implementation strategies and staffing arrangements. While we segmented data by outcome area for our pilot, analysis across outcome areas would be useful to answer questions about interrelationships between outcomes.

Segmenting data for analysis also makes it easier to check the results for errors. Generative AI algorithms are still plagued by problems like hallucinations, systemic errors, and bias and can produce erroneous results, so it is essential to include careful quality assurance and error-checking at each step. Human judgement is required to 1) error-check the data and the GenAI results, 2) interpret the GenAI results, and 3) make judgements to answer questions.

As with any other analysis tool, Generative AI will reproduce errors from the data in its results. In addition, GenAI does not do a good job accounting for nuanced contextual factors like cultural specifics. Human judgement often relies on subtle contextual clues to make sense of things. For example, a term may mean one thing in one context and something else in another context - the ability to recognise that shift still eludes the algorithms. Quality assurance processes to prepare the raw data for analysis should involve those knowledgeable about the data collection process, the project, and its context. They can help fill information gaps, remove errors, and address anomalies and ambiguous terms.

During analysis, information about the project can also be critical for error-checking results. For example, our OH pilot analysed data on outcomes in community-based education activities. In interviews, respondents used the word “teacher” to refer to instructors in both the classroom and community-based settings. This anomaly led to erroneous results about the project’s community-based education outcomes. Since our team error-checked each AI-generated map, we caught the error and removed the data related to classroom outcomes.

Who? Use principles to decide who should participate in analysis and sensemaking

Many evaluation practitioners recognise the importance of prioritising local leadership in programs and evaluations. Reliance on specialised technology (and the expertise required to operate it) has the potential to work against local leadership if those using the AI-assisted tools sit elsewhere. Those using the AI-assisted tools necessarily assume many analysis decisions. Every AI-assisted task involves analysts making decisions about data segmentation, as well as cycles of results testing, verification and refinement. Who will take part in these decisions? And how will this influence sensemaking and interpretation? During our GenAI-assisted OH pilot, we explored ways to ensure that key sensemaking and learning roles remained in the hands of local evaluators and decision-makers.

Steering by our principle of prioritising local leadership in evaluation, we asked:

  • Where is local leadership most important?
  • Where does it contribute to accurate and useful findings?
  • What parts of the evaluation process should remain in the hands of local evaluators?

When? Use principles to decide at what point in the analysis process to apply AI

Considerations of roles (who participates and how) are closely tied to choices around what point in the analysis to incorporate an AI-assist. You may be tempted to apply GenAI earlier in the evaluation and research process than you normally do. You may think: I have this amazing tool, why not use it as soon as possible? If I have a sports car in the garage, why am I still riding a bicycle?

My recommendation is to stop and consider how this will influence the method and participation in the analysis. During our pilot, we noticed that where we placed the GenAI-assisted analysis tasks (with corresponding error-checking and interpretation tasks) influenced the overall evaluation process.

Consider:

  • How does the placement of the GenAI-assist in the evaluation influence the process and findings?
  • What is the best point in the evaluation process to use GenAI-assisted analysis tools?

For many of the projects where I work, local team members play critical roles in drafting rich and accurate individual outcome descriptions from the interview transcripts. And local teams report that they learn valuable lessons from this process. For that reason, we made the decision that, in our pilot, local evaluators would draft the outcome descriptions, and we would position the GenAI-assist at the point where evaluation teams tend to struggle – when analysing multiple outcome descriptions to understand whether and how the causal pathways are interconnected. Equally important, analysing the outcome descriptions produced by the local evaluation team resulted in more accurate GenAI results with less disruption to the OH approach. The local evaluation team had used their knowledge of the project and its context to thoroughly review the transcript data and fill information gaps and remove errors, anomalies and ambiguous terms to craft complete and accurate outcome descriptions.

Privacy issues present another set of decisions for analysis planning. This comes up in at least two ways. First, if data intended for analysis contains personally identifiable information (PII) or content on sensitive topics that 1) should not be shared with those beyond the evaluation team or 2) are too complex or nuanced, then, the analysts should consider whether it’s appropriate to redact the data before sending to the GenAI tool, or if a human analyst would be a more appropriate choice. Second, evaluators may also make thoughtful choices about the specific AI tool, i.e. the Generative AI large-language model, that they use for analysis. Not all AI tools were developed in alignment with “people-first data practices that prioritise transparency, consent, and redressal while allowing people and communities to retain control of and derive value from their own data” (Principles for Digital Development). For our pilot, we used the AI-assisted Causal Map app to identify and visually represent causal relationships in Outcome Harvesting data.

Summary

When using GenAI to analyze your qualitative data, use a principle-led analysis plan to turn implicit choices into intentional, ethically and methodologically sound decisions.

Many of the lessons described here were learned in collaboration with Steve and Gabriele at Causal Map when we piloted ways that AI-assisted causal mapping can address limitations in Outcome Harvesting analysis (Strengthening Outcome Harvesting Analysis with AI-Assisted Causal Mapping (PDF)).