The science behind data collection - gLOCAL 2023 webinar recap

Thembi Mahlangu
Blog image for the Art and Science of Data Collection

This blog was contributed by Thembi Mahlangu from Khulisa Management Services. Khulisa Management Services is a leading Monitoring & Evaluation (M&E) consulting firm with extensive cross-sectoral experience, including democracy, human rights, and governance (DRG), health, education, energy, economic growth, youth, agriculture, and social development. Khulisa has been “measuring progress accurately” for 30 years. Khulisa specializes in providing M&E services, technical assistance, capacity building, training and development, systems strengthening, institutional support, learning, sharing and other related services to public and private sector partners.

This blog shares some of the lessons from the gLOCAL2023 webinar, The science behind data collection: how to choose the best tools and approach to collect data considering the culture, context, and existing partnerships. The webinar was presented on May 31, 2023 at 14:00 (SAST) and outlined how to design data collection efforts through five considerations: the budget, respondents and accessibility, the kind of data being collected, challenges to data collection and how the data will be used.

While this blog focuses on data collection tools, it's worth noting that the choice of tool, while an important part of the evaluation design, is just one step in a larger process. An effective evaluation design goes beyond the tools and takes a holistic approach to answering the questions at hand. This includes understanding the purpose of your evaluation, deciding on key evaluation questions, and defining the scope of your study. The context, stakeholders involved, and the resources available for evaluation also significantly influence these decisions. For a more in-depth understanding of the overall process of evaluation design, the BetterEvaluation Rainbow Framework provides a systematic guide to navigate through the different stages of the evaluation process, from defining what is to be evaluated to synthesizing data for meaningful conclusions. The data collection methods discussed in this blog form part of the ‘Describe’ task, and, as discussed by the webinar participations, there are a number of considerations at this stage that can help you make the best choices about data collection tools for your particular context.

Factors to consider when choosing data collection tools

There are many questions an evaluator or evaluation team must ask when designing a data collection project:

1) What’s the budget?

Cost is a huge factor when deciding how to collect data for an evaluation. Sending field workers out to collect data in person is hugely expensive, as is computer-assisted telephone interviewing (CATI). Programming online data collection forms, as well as cleaning and validating data can also be costly, depending on the program.

Fortunately, there are free data collection tools available that can save on costs, reduce the likelihood of errors, and make quality data collection possible for all different types of organizations. Khulisa and its partners were early adopters of Open Data Kit (ODK), an open-source data collection tool, and have recently moved to KoboToolbox, a free tool provided by a technology non-profit, for several of its evaluations in the education sector. Kobo is user-friendly, allowing enumerators to collect data in any language, anywhere in the world, using phones or tablets. These features make Kobo a great fit for Khulisa’s Schools 2030 project, which involves surveying in dozens of schools across nine countries. “The huge benefit of KoboToolbox is that it’s free. “It’s used as a humanitarian tool,” says Khulisa Monitoring and Evaluation Associate Jesse Webb. “And that’s really incredible because we can work with people all over the world and ask them to get this tool.”

“[Kobo] just made it a lot more user friendly, and a lot of people can now build a survey, using functionalities that on ODK would have been more complicated because you needed to know a little bit of coding,” says Senior Monitoring Evaluation Research and Learning (MERL) Specialist, Leticia Taimo. “It’s a very powerful tool because it includes a lot of options within it to validate your data on the go, to use things like skip logic and native calculations,” Jesse says.

2) Who are the respondents?

When deciding on a data collection design, the question is: Who do we need the data from?

“If it’s young people with access to cell phones and internet and data, any of the electronic tools would work,” says Education and Development Director Margie Roper. “If it’s an enumerated survey, with field workers, you can use a tool that requires training to collect the data. If it’s a group for which an electronic device may be intimidating [like older people, or anyone without access to the internet], then you could go for paper-based. But we’re really trying to move away from paper-based because there are a lot more errors in the whole process of capturing the data and cleaning it.”

In some cases, there are specific data collection tools available for respondents in a particular practice area. RTI International has developed a tool called Tangerine, designed specifically for early grade reading and math literacy assessments. “A lot of organizations use it worldwide because it is the only [data collection tool] that was developed specifically for the early grade reading assessment…We wouldn’t be able to administer the EGRA in the way that we do if we if we didn’t use this tool,” says Leticia.

For complex projects with more than one type of respondent, a combination of survey tools often works best. In the case of setting early grade reading benchmarks in African languages, Khulisa uses Tangerine for its learner assessments and Kobo for its teacher surveys.

3) Is there online access?

Inconsistent internet access and/or cellular coverage often necessitates innovative data collection design. When conducting surveys in rural areas, or anywhere that online coverage is challenging, it’s important to choose tools that function both online and offline.

Tangerine uses tablets that can collect data both with and without cellular service. “When you are out there, and there is no network, you’re still able to actually do your assessment and then you send the data later,” says Evaluation Coordinator Tshandapiwa Tshuma, who works on Khulisa’s Tshivenda Language Benchmarking Project in rural South Africa. “So that’s one of the really good advantages with Tangerine.”

When collecting large amounts of qualitative data using laptops, which aren’t always able to access the internet as easily as phones or tablets, Khulisa sometimes uses a tool called Jotform. Jotform, which offers both free and premium versions, allows data collectors to record information using fillable PDF forms.

4) What kind of data are you collecting?

Data collection is not always about asking people questions and recording their answers. As the field of evaluation has grown broader and more wide-ranging, the data collection field has created progressively more sophisticated tools for various purposes.

Khulisa uses ArcGIS, an app based on geographic information system (GIS) mapping, to create interactive maps to track human trafficking hotspots. ArcGIS allows us to create heat maps with locations like high-density areas, areas with high concentrations of immigrants, high risk areas of trafficking in persons etc.

Heatmaps are a useful tool to aggregate different data sources for the purpose of either demonstrating how funders can allocate resources for a given project, or to determine how a project might sample the population,” noted Thembi Mahlangu, Stakeholder Manager for Khulisa’s Measures for Countering Trafficking in Persons in South Africa (MCTIP) project. “The heat map is really talking to hotspots, to say, ‘Where is your highest risk area where human trafficking can occur, and how do we respond to that so that we have interventions that can help those people?

5) What are the challenges?

Some evaluations involve challenges – difficult terrain, political instability, etc. – that require major considerations in the data collection design process. In 2020, the explosion of the COVID-19 pandemic forced evaluators around the world to reconfigure their data collection strategies.

When South Africa’s pandemic lockdown began, the Khulisa team realized it would not be able to conduct in-person interviews of school principals and staff for the Early Grade Reading project. So, the team turned to Geopoll to conduct interviews via CATI. Khulisa has also used Viamo, a company specializing in mobile communication, for CATI data collection.

The Big Question: What is the role of artificial intelligence (AI) in data collection and analysis?

AI technology has recently exploded, and there are many new AI programs and applications available to speed up analysis and writing. But the data science considerations discussed above are still crucial when designing data collection processes and activities. AI can speed up processes, but it doesn’t possess the same contextual information as human evaluators. The job of an evaluator is to make value judgements, and AI doesn’t have values. The culture and context of the people from whom you are collecting the data is vital to analyze the data accurately.

Khulisa will continue to monitor new AI going forward to determine how to best use this evolving technology in data collection. You can watch the recording of our gLOCAL webinar discussing the science behind data collection and the role that AI might play.

Watch the webinar recording

View the presentation slides for the webinar here (PDF 4.55MB).


Related content