This is post #2 in our Under the Data Tree blog series.

How concerned should we be if we notice a CommCare user submitting less data than they have been submitting recently? We have the intuition that most CommCare users will submit about the same amount of data from month to month. While there are seasonal effects and other reasons why responsibilities might vary, we hypothesize that most of the time, a user’s responsibilities don’t shift quickly. Many CommCare users need to follow up regularly with their cases; the number of cases they have changes relatively slowly as new ones are registered and others are discharged. Additionally, we expect that an individual user will tend to be consistently high-performing or low-performing over time, both in terms of fulfilling their responsibilities and how well they use CommCare.

Testing our intuition about user consistency is an important part of our effort to understand and improve user behavior. If we know how much data to expect from each user, we can create alerts if they don’t meet those expectations. These alerts can be used to trigger follow-up or other interventions by project supervisors. Having a baseline for user activity will also help us evaluate the effectiveness of such an intervention – if we improve user behavior in one month, can we expect that improvement to carry over to future months?

To assess user consistency, we evaluated 2,831 users that actively used CommCare for at least six months. These users were from 144 projects worldwide. Based on the analysis in the previous blog on the speed of CommCare uptake, we only included data starting six months after these users first started using CommCare. We also only included users who recorded less than or equal to 100 visits in any month. Users who recorded more than 100 visits in a month are possibly non-human users with automatic data upload patterns, so they were excluded from this analysis. Our final dataset contained 22,854 pairs of consecutive months across all eligible users.

We found a high correlation between the number of unique cases that a user visits from one month to the next. The R-squared value was .67 with a 95% confidence interval (CI) of 0.66 – 0.68, where an R-squared of 1.00 indicates a perfect correlation. As an illustrative example, we focused on 1,001 users who were active in March 2014 and April 2014. This subset of users is visualized in Figure 1 below, showing the number of cases visited by each user in April 2014 by the number of cases they visited in March 2014. If all users in this graph visited an exactly consistent number of cases in both months, we would expect all of their observations to fall on a clear 45 degree diagonal line with no density variation outside of that line. We do not see such perfect consistency in user activity, but we do see a significant correlation in the number of cases visited by users during these two months. Figure 1 shows the density of user observations, with dark blue indicating the highest density and light blue indicating a lower density of users. As you can see, there is generally a strong correlation between the number of cases a user visited in March 2014 and in April 2014, with the highest density on the lower left of the graph indicating a large number of users who visited less than ten cases in both months.


                      Figure 1: Number of cases visited in April 2014 by number of cases visited in March 2014 (N = 1,001 users)

We were able to further illustrate this question of user consistency at the project level. Figure 2 below compares users from Project A (left) vs. Project B (right), showing the number of cases visited in a given month by the number of cases visited by the same user in the previous month. Project A shows a higher level of consistency in user activity from month to month (R-squared = 0.75), while Project B shows a low level of consistency in user activity (R-squared = 0.36).


Figure 2: Project A (left) vs. Project B (right) – Number of cases visited in a given month by the number of cases visited in the previous month

As mentioned at the beginning of this blog, we hypothesized that individual users would tend to remain as either high performing users or low performing users over time. Figure 3 below shows user consistency for the top 10% of users (left) and the bottom 10% of users (right). To identify the top and bottom performers each month, we selected all projects that had at least 30 active users that month. We then identified the 10% of users that visited the most cases and the 10% that visited the fewest cases. In Figure 3, the user’s high/low performing month is on the y-axis, while the number of cases that they visited in the previous month is on the x-axis.


Figure 3: Top 10% users (left) vs. bottom 10% users (right) – Number of cases visited in a given month by the number of cases visited in the previous month

These graphs suggest that a user who performs in the top 10% is likely to have been previously performing at a high level (R-squared = 0.69). On the other hand, a user who performs in the bottom 10% was not necessarily a low performer in the previous month (R-squared = 0.34).

So far, we’ve described user activity by the number of unique cases that a user visits in a given month. There are a lot of other options for how to determine how active a user is in a month. In future blogs, we’ll be narrowing down on which measures we think are most useful. Table 1 below shows the R-squared value for several other candidate measures of user activity. As you can see, there is some variation in how consistent a user is based on these different measures. For example, the number of new cases registered by a user from month to month has an R-squared = 0.53. This lower level of consistency makes sense since the registration of new cases often happens in bursts, not necessarily in steady, consistent quantities. However, some of the other measures support the theory that users have consistent activity levels (e.g. percent of days in a month that the user submitted data, R-squared = 0.75, suggesting that users are very consistent in the percent of days that they are active from month to month).

Table 1: Candidate measures of monthly CommCare activity – correlation in user activity for two consecutive months

Measure of user activity each month

User consistency: R-squared [95% CI]

Percent of days in a month that the user submitted data 0.75 [0.75 – 0.76]
Number of unique cases followed-up by the user 0.72 [0.72 – 0.73]
Number of unique cases visited (registered or followed up): This is the measure we visualized above 0.67 [0.66 – 0.68]
Number of forms the user submitted 0.67 [0.66 – 0.68]
Number of visits the user made 0.63 [0.63 – 0.64]
Number of new cases registered by the user 0.53 [0.52 – 0.54]
Number (median) of visits that the user made per active day 0.49 [0.48 – 0.50]

Finally, if a user is becoming less active on CommCare, one concern is that they are heading towards a terminal attrition in their CommCare usage. To investigate this, we analyzed 1,055 instances where a user stopped using CommCare for at least 3 months. We only included instances where the user was active in the preceding four months – this allowed us to track users’ activity levels leading up to their 3-month attrition event(s).

Figure 4: Activity levels among users with attrition – Number of cases visited by number of months prior to 3-month attrition event

Figure 4 shows the median number of cases visited in the months leading up to when a user stopped using CommCare for at least 3 months. As you can see, there is a clear trend of declining activity from the fourth month before an attrition event to the month just before a user stops using CommCare.

Coming back to the main question of this blog, it seems that a quick change in user activity levels is unusual. We should especially be concerned if there is a noticeable and inexplicable decrease in user activity; users are generally consistent and a pattern of declining activity could be a signal for an impending attrition event. From the programmatic angle, such decreases in user activity could be very helpful in improving frontline worker retention.