Cloud Platform at Google I/O - Enabling developers to tame production systems
in the cloud
By Brad
Abrams, Google Cloud Platform Team
Whether it’s the next viral game, social sharing app or hit SaaS application, the velocity of
your innovation is driven by the productivity of your dev team. This week at Google I/O we
talked about several new tools that enable developers to understand, diagnose and improve
their systems in production.
Cloud Debugger
Today the state of the art of debugging for cloud applications isn’t much more than writing
out diagnostic messages and spelunking the logs for them. When the right data is not being
written to the logs, developers have to make a code change and redeploy the application to
production. That is the last thing you want to do when investigating an issue in production.
Traditional debuggers aren’t well suited for cloud-based services for two reasons. First, it
is difficult to know which process to attach to. Second, stopping a process in production
makes it hard to reproduce an issue and gives your end-users a bad experience.
The Cloud Debugger completely changes this model. It allows developers to start where they
know best - in the code. By simply setting a watchpoint on a line of code, the next time a
request on any of your servers hits that line of code, you get a snapshot of all the local
variables, parameters, instance variables and a full stack trace. This works no matter how
many instances you are running in production. There is zero setup up time and no complex
configuration to enable. The debugger is ideal for use in production. There is no overhead for
enabling the debugger on a project and when a watchpoint is hit very little noticeable
performance impact is seen by your users.
Cloud Trace
Performance is an important feature of your service which directly
correlates
with end user satisfaction and retention. No one intends to build a slow service,
but it can be extremely difficult to isolate the root cause of sluggishness when it happens.
Especially when the issue hits only a fraction of your users.
Cloud Trace helps you visualize and understand the time spent by your application for request
processing. This enables you to quickly identify and fix performance bottlenecks. You can even
compare performance from release to release with a detailed report. You can leave Cloud Trace
enabled in production because it has very little performance overhead.
In this screenshot, you can see we have investigated a particularly slow trace and we see a
detailed breakdown of where the time is being spent. It looks like the problem could be these
numerous sequential calls to Datastore, so maybe we should consider
batching
them.
So we go update our service to batch the Datastore calls, and deploy the updated service. Now
we can use Cloud Trace to verify the fix.
As a developer, you can easily produce a report that shows the performance change in your
service from one release to another. In the following report, the blue graph shows the
performance without datastore batching and the orange graph shows the performance after
releasing the change to use datastore batching. The X-axis of the graph represents the time
taken (logarithmic scale) to service requests, and the left shift of the orange graph shows
the significant performance gain due to Datastore batching.
Cloud Monitoring, Powered by Stackdriver
Cloud Monitoring provides rich dashboards and alerting capabilities that help developers find
and fix performance problems quickly.
With minimal configuration and no separate infrastructure to maintain, Cloud Monitoring
provides you with deep visibility into your Cloud Platform services. For example, you can use
Cloud Monitoring dashboards to diagnose cases where your customers are reporting slow response
times or errors accessing your applications:
Likewise, you can create alerting policies so that you are notified when key metrics, such as
latency or error rates, pass a given threshold in the future:
You can configure alerts for any metric in the system, including those related to the
performance of Cloud SQL databases, App Engine modules and versions, Pub/Sub topics and
subscriptions, and Compute Engine VMs. With Compute Engine VMs, you can create alerts for both
core system metrics (CPU, memory, etc.) and application services running in the VMs (Apache,
Cassandra, MongoDB, etc.).
You can also create dashboards that make it easier to correlate metrics across services. For
example, it takes a few clicks to create a dashboard that tracks key metrics for an App Engine
module that connects to a set of Redis VMs running on Compute Engine:
Finally, you can create endpoint checks to monitor availability and response times for your
end-user facing services. Endpoint checks are performed by probes in Oregon, Texas, Virginia,
Amsterdam, and Singapore, enabling monitoring of latency from each of these five regions.
SSH to your VM instantly
Sometimes it is inevitable to connect directly to a VM to debug or fix a production issue. We
know this can be a bit of a pain, especially when you are on the road, so now you can do that
from just about anywhere. With our new browser based SSH client you can quickly and securely
connect to any of your VMs from the Console. No need to install any SDK or tools. The best
part is, this works from any desktop device with most major web browsers.
Ready for a Spin?
All of these features are just about ready for your applications. Stay tuned to this blog, we
will post updates as they are more widely available.
Posted by Louis Gray,
Googler