A data collection plan is just that – a plan for how the information your program hopes to collect will flow from its source all the way to the actionable insights you hope to glean from it. The process of developing this plan will reveal things about where your data comes from, who has access to it, and how it is collected and stored – all of which are key pieces of information that will inform the design and implementation of any new system you choose.
Gillian Javetski, COO & Co-Founder of TecSalud — an ICT4D company in Bogotá and Cambridge — explains: “When you have to map out your project from square one, it opens your eyes to gaps you didn’t see earlier and that technology may not be able to fix. All of a sudden, the conversation may shift from ‘What do we want this technology to do’ to ‘Wait, actually, is the problem in our workflow?’”
Once you have clarified your project objectives, taken stock of your data requirements, and determined the method of data collection you hope to employ, it’s time to put all the pieces together.
Data collection programs can be a complex interaction of data sources and collection techniques
Two approaches to a data collection plan
There are two primary methods of organizing a data collection plan that we typically use. One is more visual and maps out the flow of information specific to that program. The other is more analytical, applying a standard set of criteria to the process for you to fill out in a way that makes sense for your program. Each has its strengths and weaknesses, but they share the goal of documenting your data collection plan in a way that can be shared, analyzed, and improved.
Information workflow diagram
A workflow map is a diagram of components and their connections throughout a particular process. When used correctly, workflow maps can increase program efficiency, reduce errors, and improve outcomes. When designed well, they portray a clear beginning and end. After reviewing a workflow map, someone unfamiliar with the program should be able to explain it clearly from start to finish.
Workflow maps actually represent a number of different types of flow diagrams–from organizational hierarchies to application workflows and information flow diagrams. An information flow diagram is most often what we use when mapping how information moves through an existing data collection process. It typically starts with what data is being collected (e.g. quantitative data vs qualitative data or specific variables) and follows through from how it is collected (e.g. paper forms vs mobile device) to where it is stored and how it is shared from the bottom to the top of your organization (e.g. reporting presentation vs online dashboard).
This is a very basic version of an information flow diagram based on a typical CommCare project.
This is the most basic version of an information flow diagram. Data from beneficiaries is collected by community health workers (or other data collectors) using a mobile data collection app that wirelessly sends data to the cloud. There, it is accessed by a program manager or analyst on a desktop platform. Of course, this version doesn’t include what type of data it is or how that program analyst shares reports with their superiors, funders, or the government. However, that is exactly the type of information that is covered in more complex information flow diagrams.
Two key questions to ask when designing your information flow diagram are (1) What are the major milestones that occur in this process? And (2) What are the major component types (e.g. actions/activities, documents, decisions, etc.)? The answers to these questions are like the pieces to your puzzle. Once you collect them all, start with the outside and work your way in. In other words, begin with your data source and your final output and then fill out the pieces in-between.
The beauty of an information flow diagram is that you can read the same diagram from bottom-to-top and notice different things about your data collection process than if you review top-to-bottom. The change in perspective will help reveal something about the way your data is used and potential means for improving it.
Bonus tip: We like to use a platform called Draw.io to create our workflow maps. Check it out here!
Data collection plan outline
The other way we often analyze the data collection process is by filling out a data collection plan outline. It helps organize each variable you are collecting by source, its method of collection, timeline, where it is stored, and how it is analyzed and shared.
The first few rows of a data collection plan outline from Americorps.
Compared to the information flow diagram, which looks to map data through your entire program in a visual way, a data collection plan outline typically summarizes relevant characteristics of each variable in a table or chart. This approach does not quite measure up to an information flow diagram in terms of viewing your program’s strengths and weaknesses from a high level, but it’s great for organizing detailed notes on how each variable is collected, who has access to what, and even how it might be analyzed. In fact, you can often find some insightful trends by reviewing each row in the chart together. For instance, in the chart above, you can see how the source of your data might differ between data points #1, #2, and #3, by reading across row #2 (“source of data”).
One reason we like the Americorps version of this outline is that it ends with “How will the data be used for program improvement?” It is a good reminder that, regardless of your data collection program objectives, you can always examine your results in a way to improve your final output. Addressing that question early and deliberately is a good way to make it a habit and improve the sustainability of your program.
As you develop your plan, it’s not uncommon to begin to consider aspects of your program you hadn’t thought about before. This is an intentional aspect of the process. It’s much better to head into the design and implementation phase of your program aware of these facets than it is to retroactively build them into a program.
Often, these planning frameworks don’t include the dimension of time. Consider ways you might incorporate it to account for how often your frontline workers will head into the field to collect data or how often you will lead them in refresher training sessions.
Approvals and consent
Depending on the type of data you collect and how it flows through your program, you may need to request consent from beneficiaries or approvals for data integration from another organization. Think about how you might note these potential bottlenecks on whatever method of planning you choose.
What other aspects of your program might you need to account for? It’s different for everyone. Is there a particular consequence of selection bias you might need to account for? Might you need to include two methods of data collection to collect different types of data in the same program? Once you have organized your data collection plan, these types of considerations should be easier to spot.
Organizing your data collection plan isn’t that painful when you’re working with your team
Why use a data collection plan?
The most important reason to use frameworks like workflow maps and data collection plan outlines is that they help you to understand the stakeholders, data sources, and points of connection that will reveal areas for improvement and strengths to take advantage of.
For example, in an analysis of the time between the collection of data and the submission of that data to the server in the 20 countries with the highest CommCare usage, a careful observation of the clinical workflow helped the Dimagi Data Science team determine that 75% of CommCare users were using their application as an offline data collection tool. By understanding how the data flowed in these low- to no-connectivity environments, it was then possible to optimize surveys, flows, and general user experience accordingly.
More recently, one of the most important uses of an information flow diagram has been to assess privacy risks related to data protocols. The EU’s General Data Protection Regulation (GDPR) forced millions of organizations working with user data from the EU market to examine their data flow to uncover any potential violations of the regulation before it went into effect.
In each of these cases, the effort to review every aspect of an existing process and map the interactions between them made for a better final product. This is not a coincidence. These projects are made up of interacting components, and if you can understand how each variable related to the others in your data collection process, you can build a map that provides you strong insights for improvement and tells you precisely where to focus your efforts.