This guest post was written by Aleem Mawani, Co-Founder of Streak, a startup alum of Y Combinator, a Silicon Valley incubator. Streak
is a CRM tool built into Gmail. In this post, Aleem shares his experience building and scaling
their product using Google Cloud Platform.
Everyone relies on email to get work done – yet most people use separate
applications from their email to help them with various business processes. Streak fixes this
problem by letting you do sales, hiring, fundraising, bug tracking, product development, deal
flow, project management and almost any other business process right inside Gmail. In this
post, I want to illustrate how we have used Google Cloud Platform to build Streak quickly, scalably
and with the ability to deeply analyze our data.
We use several Google technologies on the backend of
Streak:
Our core learning is that
you should use the best tool for the job. No one technology will be able to solve all your
data storage and access needs. Instead, for each type of functionality, you should use a
different service. In our case, we aggressively mirror our data in all the services mentioned
above. For example, although the source of truth for our user data is in the App Engine
Datastore, we mirror that data in the App Engine Search API so that we can provide full text
search, Gmail style, to our users. We also mirror that same data in BigQuery so that we can
power internal dashboards.
System
Architecture
App Engine - We use App Engine for Java primarily to serve our application to
the browser and mobile clients in addition to serving our API. App Engine is the source of
truth for all our data, so we aggressively cache using Memcache. We also use
Objectify to simplify access to the Datastore, which I highly
recommend.
Google
Cloud Storage - We mirror all of our Datastore data as well as all our log data in Cloud
Storage, which acts as a conduit to other Google cloud services. It lets us archive the data
as well as push it to BigQuery and the Prediction API.
BigQuery - Pushing the data into BigQuery allows us to run non-realtime queries that can
help generate useful business metrics and slice user data to better understand how our product
is getting used. Not only can we run complex queries over our Datastore data but also over all
of our log data. This is incredibly powerful for analyzing the request patterns to App Engine.
We can answer questions like:
Which requests cost us the most
money?
What is the average response time for every URL on our site over the last 3
days?
BigQuery helps us monitor
error rates in our application. We process all of our log data with debug statements, as well
as something called an “error type” for any request that fails. If it’s a known error, we'll log something sensible, and we log the exception
type if we haven’t seen it before. This is beneficial because we built a dashboard that
queries BigQuery for the most recent errors in the last hour grouped by error type. Whenever
we do a release, we can monitor error rates in the application really
easily.
A Streak dashboard powered by BigQuery showing current usage
statistics
In order to move the data into Cloud Storage from the Datastore and
LogService, we developed an open source library called Mache. It’s a drop-in library that can be configured to automatically
push data into BigQuery via Cloud Storage. The data can come from the Datastore or from
LogService and is very configurable - feel free to contribute and give us feedback on
it! Google Cloud Platform
also makes our application better for our users. We take advantage of the App Engine Search
API and again mirror our data there. Users can then query their Streak data using the familiar
Gmail full text search syntax, for example, “before:yesterday name:Foo”. Since we also push
our data to the Prediction API, we can help users throughout our app by making smart
suggestions. In Streak, we train models based on which emails users have categorized into
different projects. Then, when users get a new email, we can suggest the most likely box that
the email belongs to.
One issue that arises is
how to keep all these mirrored data sets in sync. It works differently for each service based
on the architecture of the service. Here’s a simple
breakdown:
Having these technologies easily
available to us has been a huge help for Streak. It makes our products better and helps us
understand our users. Streak’s user base grew 30% every week for 4 consecutive months after
launch, and we couldn’t have scaled this easily without Google Cloud Platform. To read more
details on why Cloud Platform makes sense for our business, check out our case study and our post on the
Google Enterprise
blog.
Aleem Mawani is the
co-founder of Streak.com, a CRM tool built into Gmail. Previously, Aleem worked on Google
Drive and various ads products at Google. He has a degree from the University of Waterloo in
Software engineering and an MBA from Harvard University.