Datatree

 

This is post #4 in our Under the Data Tree Series, where members of Dimagi’s data science team share insights from analyzing CommCare data.

In analyzing usage data from CommCare, we’d expect that some Frontline Workers (FLWs) will use CommCare more regularly than other FLWs. But what is a typical distribution of activity for FLWs in the same project? Will they follow a Bell Curve, also known as the Normal Distribution? If so, it should look something like this distribution:

 

dt1

 

The above graph shows the distribution of activity level for FLWs in one of our larger programs in India. For this analysis, we’re using the percent of days in a given month that an FLW submitted any data using CommCare. The graph shows the distribution of all months in which a user was active. We excluded months in which an FLW does not submit data because the FLW might be on leave or have already left their job, and also because the results don’t change much if we include them as zero activity.

As you can see, these users very much follow a bell curve. The most common level of activity is right around the median level of activity. The farther you get from the median in either direction, fewer and fewer FLWs have that level of activity. Or put more simply-the vast majority of FLWs are “middle performers”, and have an activity level that is near the average. There are relatively few high- and low- performers.

While this is an example of one CommCare project, this seems to be the exception for all CommCare projects on average. The distribution of average activity for all 306 programs and 9,267 CommCare users that were active between 2010 and 2014 is:

DT2

 

As you can see, this is not a bell curve at all. In this distribution, most FLWs do not exhibit an average or near-average level of activity (which is what happens when there is a Bell Curve). Instead, there is a long tail of very highly active FLWs, as well as a large number of FLWs with low activity. That is, there are more high- and low- performers than middle-performers.  This corresponds to the fact that human behavior often does not follow a Bell Curve, but instead follows a “Power Law” or “long tail” distribution, which is described in this Forbes article.

The above graph represents all users across all of our projects. Based on the analysis from our first blog in this series, we excluded the first six months of activity to see if that would change the distribution.

DT3

As you can see, the curve is shifted towards higher levels of activity if we exclude the first six months of each user’s activity. But even so, it still follows a power law.

These graphs show the distribution of all users, but do the majority of individual projects follow the power law? Below we show the graphs for the 12 projects with the most months of activity.

dt4

As you can see, there is a variety of distributions, including some that do resemble a Bell Curve. But a Power Law curve is more common. And there is almost always a long tail of highly active FLWs.

For CommCare implementers, one reason why this is important is that it may help you interpret statistics you generate on data from your users. For example, suppose you compute that on average (or better yet, the median) your FLWs submit 38 forms per month. If you think usage follows a Bell Curve, you’ll conclude that most of the FLWs in your program are submitting about 38 forms every month. That is, you’ll expect that most of your users are middle-performers. However, if they follow a power law, then the majority of your users are either high- or low- performers and submit many more or many fewer than 38 forms.

Another question is how the distribution of behavior changes over time. For the following graphs, we analyzed 623 FLWs from 31 projects with 18+ months of activity (both continuous and not continuous). We randomly excluded user-months of activity so that no one of these projects contributed over 10% of the data to the set. The graphs below show the distribution of these 623 FLWs’ performance for their first three months of usage, second three months, etc

dt5

You can see that the users generally become more active as they used CommCare more, especially during the first six months. You can also see that there are plenty of users who are high performers in their first three months, but also many users who have low activity in the first three months, and then shift to higher activity afterwards (in that the mass of the distrubtion shifts to the right from the first graph to the second). For FLWs who continue to use CommCare for 18 months, it seems that each three month period becomes more active for at least the first year, and may level off or even decline in the second year.

The above analysis tells implementers of mHealth systems not to expect a Bell Curve of performance from their users. This should be taken into account when interpreting statistics (such as the median number of forms submitted) across users.  It’s likely that there will be a large number of lower performers and a long tail of higher and higher performers, which likely present different challenges and opportunities than if most users were middle-performers and had near-average performance. For example, you may want to provide extra training and support to your lower performers and find ways to further leverage the enthusiasm and capacity of your highest performers.

Thanks for checking out Under the Data Tree! Please feel free to comment with any thoughts, or send questions to datatree@dimagi.com.