Migrating a Legacy Codebase to RequireJS

April 2, 2018

At Dimagi, our main commcare-hq repository, a Django application, was created in 2009. How long ago was 2009? It’s the year Node was invented. Chrome was only a year old. JavaScript as a language — and an ecosystem — has come a long way since 2009, and our code is evidence of this: it’s a mix of styles and tools we’ve experimented with over the years, clearly written at different times by different people.

We’ve made progress on the consistency front, documenting and enforcing conventions. Now our attention has turned to the more technically challenging problems that arise from a large codebase using mostly older JavaScript techniques:

  1. 1. Poor encapsulation: Much of our code is in the global namespace. A few areas deliberately use god-like global objects to manage complex configuration. These areas are terrifying to change, as it’s difficult to track all of the places these objects are modified and accessed.
  2. 2. Lack of dependency management: Dependencies are implicit, enforced primarily by the order that script tags/blocks appear in HTML source. When combined with Django’s hierarchical template system, this gets prohibitively complex and, again, terrifying to change.

The same issues that make code resistant to change also make it cumbersome to debug and difficult for new developers to learn. Once we started discussing these problems, it was easy to agree that moving to explicit dependency management would be a valuable step forward.

A number of serviceable dependency management tools exist, but across the board, most of the documentation we read focused on single-page apps and writing an app from scratch. Our core codebase has about 100K lines of JavaScript, some of it tightly coupled with the even larger Django application. Transitioning our legacy code would be a massive undertaking…but worth it.

This is the first in a three-post series about migrating our codebase to use modern JavaScript dependency management:

  1. 1. Decoupling JavaScript and Django
  2. 2. Migrating pages to RequireJS
  3. 3. r.js optimization and build process changes

This first post explains how we’ve decoupled JavaScript from Django templates: the rationale, the implementation details, and the process. Parts of it are specific to organizations using Django, but other parts are applicable to any organization separating its JavaScript from HTML content.

Why isolate JavaScript?

Our first step in moving to modern dependency management was to cleanly separate JavaScript out from HTML into dedicated js files – good luck wrapping your head around dependency management, much less using tools to help, when your JavaScript is scattered piecemeal across HTML files.

Isolating JavaScript would have benefits beyond preparing for RequireJS:

  • Readability: Mixing python, HTML, and JavaScript in a template gets unwieldy easily. Separating languages means less mental context switching when reading or writing code, keeping everybody saner. Poor readability makes mixed-language code especially vulnerable to escaping bugs and syntax errors.
  • Code reuse: You could define a script block in a Django template and then include that template in multiple places, but it’s more natural to define that common code in a .js file that’s obviously JavaScript-only.
  • Code organization: Many of our inline script blocks were in service of some greater goal, such as setting up page analytics, and could benefit from being in a larger, well-organized JavaScript module than spans a set of related pages.
  • Testability: Our JavaScript test setup doesn’t have access to Django templates or their inline scripts
  • Performance: Getting more JavaScript into static files allows us to take better advantage of compression and browser caching.

Should inline script blocks be fully eradicated? There are a couple of places where a small amount of truly isolated JavaScript might stay in a script block, but these are the exceptional case. One reasonable example of inline JavaScript is setting up global configuration options for less.js in the primary base template: this is just a few lines of code, and it does not relate to any other JavaScript in the codebase. On the other hand, it’s possible that for the sake of having clear conventions, even this case should move to a separate file.

Our Django-JavaScript coupling

This section is specific to organizations using Django. If that’s not you, skip ahead to the migration process.

Most of our inline script blocks were inline because they took advantage of Django functionality. Our JavaScript most often uses:

  • Translations
  • Passing server data to the client
  • Control flow
  • Client-related template tags like csrf_token

Decoupling meant finding pure JavaScript alternatives for each of these areas.

Translations

Our JavaScript is littered with code like this, containing user-facing strings:

<script>
  alert(“{% trans “I love french fries.” %}”);
</script>

It’s easy to edit this code and mismatch, or leave off, a quotation mark. It’s also easy to use single quotes in the alert instead of double, breaking the page when we try to alert French users that J’adore les frites.

Happily, Django provides statici18n for this exact issue. Once that was set up, getting trans tags out of JavaScript became easy:

“{% trans “I love french fries.” %}”

Becomes:

gettext(“I love french fries.”)

Passing server data to the client

Another common occurrence in historical code is dropping server data directly into JavaScript code:

<script>
   var thing = {{ thing|JSON }};
   actOnThing(thing);
   ...
</script>

In these cases, the data-manipulation code often already lives in its own JavaScript file, leaving just the data and an initialization statement in the template.

Our high-level solution to this is to move server data into DOM elements rather than directly into scripts, then pull that data back out of the DOM when the script needs it.

This approach increases code complexity by adding a level of indirection, but it allows for the overall reduced complexity of isolating JavaScript. To make this new level of indirection as painless as possible to adopt, we added two new pieces of code:

  1. 1. A custom Django template tag to store the data in a conventional place
  2. 2. A JavaScript utility to retrieve that data

Django: the initial_page_data tag

The master base template, from which nearly all pages derive, contains a hidden div to store server data that needs to be accessible to JavaScript:

<div class="initial-page-data" class="hide">
{% block initial_page_data %}
   {# use initial_page_data template tag to populate #}
{% endblock %}
</div>

Descendants of this base template then register data using the initial_page_data template tag:

{% initial_page_data ‘thingName’ thing %}

This tag adds a div to the .initial-page-data div, with the given name and value set as data- attributes on this child div. The tag handles escaping the value and JSON-encoding it if necessary.

JavaScript: the initial_page_data library

A JavaScript library contains utility functions to access data stored in the .initial-page-data div. The library allows data to be undefined – that is, attempting to access data that isn’t in the store returns undefined but does not fail. It throws an error if it finds the same data name twice because initial page data is effectively a global namespace and this check prevents data overwriting other data unintentionally named the same way.

Isolating JavaScript that contains server data then becomes a fairly mechanical process:

  1. 1. Add initial_page_data tags to the template containing the script, one tag per piece of data
  2. 2. In the script, import the initial_page_data.js library and replace each server-provided variable with a call to this library
  3. 3. Move the inline script out to a js file

The initial page data approach doesn’t always work well with partial templates, particularly when a partial is defining a widget that may be included multiple times on a page. In these cases, we tend to encode the server data as data- attributes in the widget’s DOM:

<div class=”my-widget” data-option1=”foo”>
   ...
</div>

…and then fetch the data during the widget’s initialization:

$('.my-widget').each(function() {
   var $widget = $(this),
       option1 = $widget.data(“option1”),
       ...
});

Control flow

Here and there, something like this pops up:

<script>
  {% if not items %}
    ...
  {% endif %}
  {% for item in items %}
    ...
  {% endfor %}
</script>

This is virtually always a variation on passing server data to the client, where python loops over some server data or branches based on its value. Once the initial page data infrastructure described above was written, this kind of control flow became straightforward to rewrite with JavaScript constructs.

Client-related template tags

Sometimes python functionality needed by JavaScript is exposed to the server via template tags, which can make fully decoupling JavaScript and Django seem impossible. These tags may come from Django itself, external libraries, or our own code.

Ultimately, custom tags generate a string, making this another variant of passing server data. The Django-provided csrf_token tag is a straightforward example. csrf_token, part of the defense against cross site request forgeries, generates a token to add to POST forms. Occasionally, we create a form in JavaScript and need access to that token. Since the token is just a string, we drop it in a DOM input element and then can access it from JavaScript:

$("#csrfTokenContainer").val()

Our migration process

Once the translation and data handoff infrastructure was in place, we instituted a convention, enforced during code review, that all new JavaScript development happens in js files, not script blocks. That left us with the existing code to migrate.

Having learned to recognize and implement the patterns described above, we wrote up a guide with a process that any developer on the team could use to migrate a section of code, test it, and handle common bugs.

It’s critical for us to be able to migrate the code piecemeal; this is too big of an undertaking to do all at once. The guide is oriented around eliminating all of the script blocks in a single page or even just one part of a page.

Migration steps:

  1. 1. Replace trans with gettext
  2. 2. Move any server-generated data out of JavaScript
  3. 3. Replace any python control flow with the JavaScript equivalent
  4. 4. Replace any client-side template tags with the patterns we’ve established to replace them
  5. 5. Move the now-Django-free JavaScript into a new file and add a script tag for this file to the Django template, or add the JavaScript to an existing .js file.
  6. 6. Test!

The migration itself is fairly straightforward, but testing can turn up some nasty bugs. These bugs are often the result of accidentally changing dependencies – providing more evidence that better dependency management would make our codebase more robust.

What’s next?

Clearing out inline scripts is a massive migration in and of itself, but it’s only the beginning of the road to using modern dependency management throughout the codebase. At this point, we’re about halfway done removing inline scripts, but that’s far enough along to start integrating a dependency management tool into migrated pages, while continuing the inline script work in parallel.

Next up, we’ll get into the heart of the dependency management overhaul: migrating individual pages from the old world to the new and setting up the infrastructure to simultaneously support both migrated and legacy code.

 

Written by
Jenny Schweers

Senior Engineer II

Read more from
Technology

The World's Most Powerful Mobile Data Collection Platform

Start a FREE 30-day CommCare trial today. No credit card required.

Get Started

Learn More

Get the latest news delivered
straight to your inbox