Designing large applications for Google App Engine

MAY 5, 2011

By Amy Unruh, Developer, Author, Consultant

This post is part of Who's at Google I/O, a series of guest blog posts written by developers who are appearing in the Developer Sandbox at Google I/O.

Mojo Helpdesk from Metadot is an RDBMS-based Rails application for ticket tracking and management that can handle millions of tickets. We are migrating this application to run on Google App Engine (GAE), Java, and Google Web Toolkit (GWT). We were motivated to make this move because of the application’s need for scalability in data management and request handling, the benefits from access to GAE’s services and administrative tools, and GWT’s support for easy development of a rich application front-end.

In this post, we focus on GAE and share some techniques that have been useful in the migration process.

Task failure management

Our application makes heavy use of the Task Queue service, and must detect and manage tasks that are being retried multiple times but aren’t succeeding. To do this, we extended Deferred, which allows easy task definition and deployment. We defined a new Task abstraction, which implements an extended Deferrable and requires that every Task implement an onFailure method. Our extension of Deferred then terminates a Task permanently if it exceeds a threshold on retries, and calls its onFailure method.

This allows permanent task failure to be reliably exposed as an application-level event, and handled appropriately. (Similar techniques could be used to extend the new official Deferred API).

From the existing Mojo Helpdesk: a view of a user’s assigned tickets.

Appengine-mapreduce

Mojo Helpdesk needs to run many types of batch jobs, and appengine-mapreduce is of great utility. However, we often want to map over a filtered subset of Datastore entities, and our map implementations are JDO-based (to enforce consistent application semantics), so we don’t need low-level Entities prefetched. So, we made two extensions to the mapper libraries. First, we support the specification of filters on the mapper’s Datastore sharding and fetch queries, so that a job need not iterate over all the entities of a Kind. Second, our mapper fetch does a keys-only Datastore query; only the keys are provided to the map method, then the full data objects are obtained via JDO. These changes let us run large JDO-based mapreduce jobs with much greater efficiency.

Supporting transaction semantics

The Datastore supports transactions only on entities in the same entity group. Often, operations on multiple entities must be performed atomically, but grouping is infeasible due to the contention that would result. We make heavy use of transactional tasks to circumvent this restriction. (If a task is launched within a transaction, it will be run if and only if the transaction commits). A group of activities performed in this manner – the initiating method and its transactional tasks – can be viewed as a “transactional unit” with shared semantics.

We have made this concept explicit by creating a framework to support definition, invocation, and automatic logging of transactional units. (The Task abstraction above is used to identify cases where a transactional task does not succeed). All Datastore-related application actions – both in RPC methods and "offline" activities like mapreduce – use this framework. This approach has helped to make our application robust, by enforcing application-wide consistency in transaction semantics, and in the process, standardizing the events and logging which feed the app’s workflow systems.

From the existing Mojo Helpdesk: a view of the unassigned tickets for a work group.

Entity Design

To support join-like functionality, we can exploit multi-valued Entity properties (list properties) and the query support they provide. For example, a Ticket includes a list of associated Tag IDs, and Tag objects include a list of Ticket IDs they’re used with. This lets us very efficiently fetch, for example, all Tickets tagged with a conjunction of keywords, or any Tags that a set of tickets has in common. (We have found the use of "index entities" to be effective in this context). We also store derived counts and categorizations in order to sidestep Datastore restrictions on query formulation.

These patterns have helped us build an app whose components run efficiently and robustly, interacting in a loosely coupled manner.

Come see Mojo Helpdesk in the Developer Sandbox at Google I/O on May 10-11.

Amy (@amygdala) has recently co-authored (with Daniel Guermeur) a book on Google App Engine and GWT application development. She has worked at several startups, in academia, and in industrial R&D labs; consults and does technical training and course development in web technologies; and is a contributor to the @thinkupapp open source project.

Posted by Scott Knaster, Editor