Patrick Keating, Field Manager in West Africa observes a Matron conducting a consultation and helps her with a question.
This week, I had the pleasure of meeting a matrone in Senegal (women in villages who provide pregnancy counseling to younger women) as part of a field visit with the local NGO, Africare. She was one of 30 matrones my Dimagi West Africa colleagues Carla Legros and Patrick Keating trained in Ziguinchor, Senegal for Africare’s CCHT project.
As part of the project, our team developed two CommCare applications. The first application is used by matrones to monitor completion of antenatal care check-ups, help identify pregnancy danger signs, and trigger referrals. The second application is by nurses, who see referral cases created by the matrones, as long as there their phones have synced via an Internet connection. After the training, the matrones had gone out to the market to buy themselves a beautiful blue and white patterned local fabrique to… wait for it… make themselves dresses with blue and white patterns that resemble the CommCare logo. The matrone I met even modeled her CommCare dress for me!
If 2013 was the year of rapid growth at Dimagi, then 2014 was even faster. Our CEO Jonathan Jackson touches upon this growth in his annual letter, focusing on how he’ll remember 2014 as the year that Dimagi hit 100 people. In addition to growing the size of our company, a lot of other big milestones occurred in 2014. The list below includes just a few of them (enjoy!)
We hit a big self-starter milestone. By the end of 2014, 40% of all CommCare projects were being run by self-starters, meaning organizations that developed and launched CommCare applications without any in-person help from Dimagi. This includes groups like Aquaya in Senegal, MIT GlobeMed in Togo, Civic Hire in Haiti, and Lift II in Malawi. Reaching this number means a lot to us, and shows CommCare’s development into a sustainable, accessible platform.
We begun scaling CommCarenationally in Haiti through a project with Pathfinder and URC, our biggest CommCare deployment to date. Haiti set the tone for taking on more nationally scaling CommCare projects, including one in Mozambique with Ariel Glaser.
We launched projects in new sectors, including social apps project in India, a cash transfer app for WFP in Zambia, and our first regional microfinance project with Small Enterprise Foundation in South Africa. Dimagi also hosted its first mEducation roundtable and women’s empowerment workshop. We also tested and validated several CommCare uses cases, including warehouse storage for agricultural programs and CommCare for Sales Agents. We talked about our findings in these new sectors at this year’s Global mHealth Forum in DC.
We focused on maturity and how to achieve economies of scale. This lead to us developing and testing our maturity model in South Africa. We’re excited to introduce it to the world in March.
We grew our logistics work with our logistics platform, CommTrack. The first CommTrack v.2 project was launched in Senegal. We also helped four organizations in India and Nepal deploy CommTrack proof of concepts.
We established new teams like our R&D team and our data research team. Our R&D team is testing and developing CommCare features in Burkina Faso and India, while our data analyzed CommCare trends, some of which can be seen in their “Under the Data Tree” blog series.
We grew our team, including hitting the triple digits. But our growth in size didn’t stop us from prioritizing spending time with each other. In 2014, we had our second Away Month in Guatemala, the first West Africa Summit, and our first tech team summit.
Want to learn more about any of these 10 things? Feel free to email me at firstname.lastname@example.org.
As we prepped for our annual meeting at Dimagi this year, I found the first slide I’d put together for our annual meeting six years ago in 2008. It opened with: “We would rather be a team of 15 people that love their job than 100 people that like their job.” When the next year’s meeting came up in 2009, we’d grown a bit so I had to change it 20 people. In 2010 I had to change it again, this time to 25. At a certain point, I eventually deleted that line because it had become a running joke that we would keep increasing the number to match our headcount every year.
2014 was an amazing year for Dimagi, and one I’ll always remember because we hit 100 people. Four years ago, our global services team has grown from a few scrappy field managers in Tanzania, trying to figure out if this thing we were calling CommCare could work. Now, we have a full-fledged (and just as scrappy) 45+ person team around the world. Similarly, prior to 2010, we didn’t have any fully self-service products so it wasn’t even possible to deploy without Dimagi services. Now, we have a large tech team, and a product where 40% of our users are building apps without Dimagi’s direct help, including self-starters like Aquaya in Senegal, MIT GlobeMed in Togo, Civic Hire in Haiti, and Lift II in Malawi.
As our company has evolved rapidly every year, I’ve come to appreciate that the 100 person barrier wasn’t the key takeaway from my old phrase, but rather the focus on making sure we all love our jobs. So instead of focusing on size, we start with a slide that says: “Dimagi’s bottom line is impact, team satisfaction, and profit, in that order.”
Certainly, as we’ve grown, how we create impact, team satisfaction, and profit has changed significantly. Having a bigger team has enabled us to do things that we couldn’t have delivered on when we were smaller. We couldn’t provide excellent support and quality to our customers without our first support lead, Nate Haduch or our first QA lead, Christy White. We couldn’t provide the amazing services we do without our country directors and everyone in the field in Senegal, India, Mozambique, Guatemala, Myanmar, South Africa, Zambia, and the United States.
Having a bigger team has given us greater bandwidth, including the chance to focus more on innovation and answer bigger, tougher questions. In 2014 we developed a Research and Development team that’s working on making better mobile features for projects in Burkina Faso, the same features that will eventually be adopted into our open source platform for everyone to access. This year we also established a data research team who can now dedicate time to analyze CommCare data trends, some of which can be seen in their “Under the Data Tree” blog series. We have project managers like Fiorenzo Conte in Senegal who are now able to focus in on prototyping new uses for our tools like CommSell. We’re able to dedicate time for people like Saijai Liangpunsakul, Devika Sarin, & Rushika Shekhar to focus their efforts on spending time in countries like Myanmar, where mHealth has just arrived.
Of course, as our team has grown, we’ve had to add in more processes, communication channels, and best practices that companies tend to do when they grow. We try to do our best to balance the autonomy and decisions we expect each team member to make with efficiency, safety and value to our customers. Many of us started or joined Dimagi because we loved the idea of working for a small, tight-knit team and were worried about growing bigger. But, we’ve learned its not an either or decision. To everyone who works here or has worked here, I think we are immensely proud that we’ve been able to maintain that tight-knit culture but adapt it to a 100 person organization.
While I’ll always remember 2014 as the year that we hit 100 people, I’ll also remember it as another year where we’ve managed to retain a company-wide spirit of folks who “…would rather be a team of 15 people that love their job than 100 people that like their job.” Whether it’s 15 or 100, our passion remains as high as its always been and our outlook for partnering to create impact is better than its ever been. Now lets see what we can do in 2015.
We had multiple recent issues that have resulted in problems with many parts of CommCare HQ. This post is meant to provide some clarity into those issues for our customers. This post is a bit technical but hopefully can provide information to people who have been frustrated with CommCare HQ over the past few days.
Above all else, all data is completely safe and secure.
However, some parts of the site are not completely functioning. You can skip to the end of this post for a summary of what is working as of this writing, or keep reading to learn the details around what happened.
Issue #1: General Site Slowness
In order to understand the root cause of the site slowness, you have to know a bit about CommCare HQ architecture.
To start with – CommCare HQ is built primarily on top of a database called Cloudant, which is an optimized, scalable implementation of the open-source database CouchDB. This is our primary data store – the place where the vast majority of our data is first saved. We also use Postgres, ElasticSearch, and Redis, however these are mostly used as secondary data stores (more on that later).
Last Friday (January 30) an event in our Cloudant database (the root cause of which we are still getting to the bottom of) caused a large amount of system resources to be taken up, the net result of which was that all requests to the database became extremely slow – often to the point of being unusable or timing out. This makes up the vast majority of the performance issues on CommCare HQ that users have been experiencing since January 30. The load/slowness reached a peak on Monday and Tuesday, and started to improve on Wednesday. As of this writing (Thursday Feb 5) it has still not returned to normal, but the site is fully functional again. We are still working with Cloudant to better understand the root cause and make sure we can prevent it from happening moving forward.
Issue #2: Exports and imports were not working
Unfortunately lot of people spent a lot of time staring at this screen…
CommCare HQ is built in Python on top of a web-framework called Django. For most normal operations on the site a web request goes to one of our Django workers, the worker processes the request and returns a page containing the information that was requested.
However, for requests that take a long time like imports and exports, we use a different piece of infrastructure that runs these jobs in the background and then notifies the user when they have completed. Any operation on CommCare HQ that we expect to take more than a minute or so is run through this background infrastructure (which uses an asynchronous job queue called Celery). Most of anything that uses a progress bar on the site is done using this infrastructure.
Our Celery infrastructure has a fixed number of workers that can process jobs at any given time. We can change this number but doing so requires adding additional hardware to our cluster. What happened when Cloudant became very slow, is that like everything else on CommCare HQ these jobs took much longer than expected to complete. Exports that were normally quite fast were taking several hours. As a result, all of the available slots for background work quickly became full and a huge list of jobs queued up waiting for a background worker to take them. Most of these jobs were eventually completed, however by the time it finished most users had already closed the page assuming that it would never finish.
Once the database performance was mostly restored on Wednesday (Feb 4) the Celery workers were able to get through the backlog of jobs and everything is currently back to normal.
Issue #3: Stale data in many pages and reports
The last issue is that many pages and reports are showing stale or missing data. Again, understanding this requires understanding a bit more about CommCare HQ’s architecture.
We mentioned that Cloudant is CommCare HQ’s primary data store, however many of the reports are written off secondary data stores that can be used to generate report data more efficiently.
Generally what happens when you submit a form to CommCare HQ is that we first save it to Cloudant and make sure that succeeds. Then a number of background processes look for recently saved forms and copy them into other data stores (Postgres and ElasticSearch). Once the form has been processed the background process saves a checkpoint indicating how far along it is.
An event on Friday (we are actually not yet sure if it was the same event that caused the other two issues or a separate one) caused many of these background processes to “reset” their checkpoints – meaning they could not tell the difference between a form that had already been processed and a new form coming in. This meant that instead of the typical 50,000 forms per day that needed to be processed, we had to go through a backlog of over 10 million forms and sync them to our other data stores. This combined with the severe slowness of the Cloudant database is causing the reindexing process to take a very long time. As a result, this is still in progress, and while we are looking for ways to speed up or shortcut the process, it is looking like we may just have to wait for a few more days for everything to get fully caught up.
See the end of this posting for a full list of affected functionality.
While we know that the past few days have been incredibly frustrating and disappointing to our users, we are proud of the fact that the one user group that was not significantly affected during this time was the front line worker. The combination of CommCare mobile working very well offline and the fact that our architecture was designed in such a way that many performance issues would not affect mobile phones has continued to make it possible for front line workers to effectively do their jobs.
We do appreciate your patience with us during this frustrating time, and we are continually striving to learn from our mistakes and position CommCare and HQ to be ready to keep scaling with our user base.
Functionality that is still affected
The following is a list of the main functionality that is still affected by the stale data issue above as of the time of this writing. We will add to this list if any areas not listed come up.
Submit history report
Case history view for single cases
Daily Form Activity report
Form Completion Times report
Form Completion vs Submission Trends
The Case Activity Report will have the wrong numbers
Worker Activity report form submission columns
Errors & Warnings Summary
Device Log Details report
Some project-specific custom reports
CallCenter / Supervisor indicators
CallCenter / Supervisor indicators will have an incomplete list of forms
CallCenter / Supervisor indicators will have an incomplete list of user cases
Over the last fifteen years, feature phones capable of running third-party apps superseded basic voice-and-text devices. More recently, smartphone penetration has been increasing rapidly in the developing world. In particular, Android devices are now widely available in most countries, and over 10 million new smartphone connections were made every month in Africa during 2013 and 2014 (GSMA, 2014).
This post explores some mobile technology trends across a large number of projects using CommCare. The CommCare mobile app runs on both feature phones and smartphones. Data can also be collected via SMS or the web. This allows us to track trends and compare usage patterns across different technologies.