Thorsten
Schaeff has been studying Computer Science and Media at the Media University in Stuttgart and
the Institute of Art, Design and Technology in Dublin. For the past six months he’s been
interning with the Maps for Work Team in London, researching best practice architectures for
working with big geo data in the cloud.
Google’s Cloud Platform offers a great set of tools for creating easily scalable applications
in the cloud. In this post, I’ll explore some of the special challenges of working with
geospatial data in a cloud environment, and how Google’s Cloud can help. I’ve found that there
aren’t many options to do this, especially when dealing with complicated operations like geofencing with multiple complex
polygons. You can find the complete code for my approach on GitHub.
Geofencing is the procedure of identifying if a location lies within a certain fence, e.g.
neighborhood boundaries, school attendance zones or even the outline of a shop in a mall. It’s
particularly useful in mobile applications that need to apply this extra context to someone’s
exact location. This process isn’t actually as straight forward as you’d hope and, depending
on the complexity of your fences, can include some intense calculations and if your app gets a
lot of use, you need to make sure this doesn’t impact performance.
In order to simplify this problem this blogpost outlines the process of creating a scalable
but affordable geofencing API on Google’s App Engine.
And the best part? It’s completely free to start playing around.
If you want to dive right in you can download the finished geofencing API from my GitHub account.
The Architecture
The requirements are:
Storing complex fences (represented as polygons) and some metadata like a
name and a description. For this I use Cloud
Datastore, a scalable, fully managed, schemaless database for storing non-relational
data. You can even use this to store and serve GeoJSON to your frontend.
Indexing these polygons for fast querying in a spatial index. I use an STR-Tree
(part of JTS) which I store as a Java Object in memcache
for fast access.
Serving results to multiple platforms through HTTP requests. To achieve this I use
Google Cloud
Endpoints, a set of tools and libraries that allow you to generate APIs and client
libraries.
That’s all you need to get started - so let’s start cooking!
Creating the Project
To set up the project simply use Apache Maven and follow the instructions here.
This creates the correct folder structure, sets up the routing in the web.xml file for use
with Google’s API Explorer and creates a Java file with a sample endpoint.
Adding additional Libraries
I’m using the Java Topology
Suite to find out which polygon a certain latitude-longitude-pair is in. JTS is an
open source library that offers a nice set of geometric functions.
To include this library into your build path you simply add the JTS Maven
dependency to the pom.xml file in your project’s root directory.
In addition I’m using the GSON library to handle JSON within
the Java backend. You can basically use any JSON library you want to. If you want to use GSON
import this
dependency.
Writing your Endpoints
Adding Fences to Cloud Datastore
For the sake of convenience you’re only storing the fences’ vertices and some metadata. To
send and receive data through the endpoints you need an object model which you need to create
in a little Java Bean called MyFence.java.
https://gist.github.com/tschaeff/778b12fb116c36e3c08c
Now you need to create an endpoint called add. This endpoint expects a string for the group
name, a boolean indicating whether to rebuild the spatial index, and a JSON object
representing the fence’s object model. From this App Engine creates a new fence and writes it
to Cloud Datastore.
For some use cases it makes sense to fetch all the fences at once in the beginning, therefore
you want to have an endpoint to list all fences from a certain group.
Cloud Datastore uses internal
indexes to speed up queries. If you deploy the API directly to App Engine you’re
probably going to get an error message, saying that the Datastore query you’re using needs an
index. The App Engine Development server can auto-generate
the indexes, therefore I’d recommend testing all your endpoints on the development
server before pushing it to App Engine.
When querying the fences you only return the ids of the fences in the result, therefore you
need an endpoint to retrieve the metadata that corresponds to a fence’s id.
To speed up the geofencing queries you put the fences in an STR tree. The JTS library does
most of the heavy lifting here, so you only need to fetch all your fences from the Datastore,
create a polygon object for each one and add the polygon’s bounding box to the index.
You want to have an endpoint to test any latitude-longitude-pair against all your fences. This
is the actual geofencing part. That’s so you will be able to know, if the point falls into any
of your fences and if so, it should return the ids of the fences the point is in.
For this you first need to retrieve the index from memcache. Then query the index with the
bounding box of the point which returns a list of polygons. Since the index only tests against
the bounding boxes of the polygons, you need to iterate through the list and test if the point
actually lies within the polygon.
The process of testing for a point can easily be adapted to test the fences against polylines
and polygons. In the case of polylines you query the index with the polyline’s bounding box
and then test if the polyline actually crosses the returned fences.
https://gist.github.com/tschaeff/b23844335ec70688ea4c
When testing for a polygon you want to get back all fences that are either completely or
partly contained in the polygon. Therefore you test if the returned fences are within the
polygon or are not disjoint. For some use cases you only want to return fences that are
completely contained within the polygon. In that case you want to delete the not disjoint test
in the if clause.
The beauty of this is, since it’s running on App Engine, Google’s platform as a service
offering, it scales automatically and you only pay for what you use.
If you want to insure best performance and great scalability you should consider to switch
from the free and shared memcache to a dedicated
memcache for your application. This guarantees enough capacity for your spatial
index and therefore ensures enough space even for a large amount of complex fences.
That’s it - that’s all you need to create a fast and scalable geofencing API.
Preview: Processing Big Spatial Data in the Cloud with Dataflow
In my next post I will show you how I geofenced more than 340 million NYC Taxi locations in the
cloud using Google’s new Big Data tool called Cloud
Dataflow.