Cloud Platform at Google I/O - Enabling developers to tame production systems in the cloud

June 25, 2014

Link copied to clipboard
Author Picture By Brad Abrams, Google Cloud Platform Team

Whether it’s the next viral game, social sharing app or hit SaaS application, the velocity of your innovation is driven by the productivity of your dev team. This week at Google I/O we talked about several new tools that enable developers to understand, diagnose and improve their systems in production.

Cloud Debugger
Today the state of the art of debugging for cloud applications isn’t much more than writing out diagnostic messages and spelunking the logs for them. When the right data is not being written to the logs, developers have to make a code change and redeploy the application to production. That is the last thing you want to do when investigating an issue in production. Traditional debuggers aren’t well suited for cloud-based services for two reasons. First, it is difficult to know which process to attach to. Second, stopping a process in production makes it hard to reproduce an issue and gives your end-users a bad experience.

The Cloud Debugger completely changes this model. It allows developers to start where they know best - in the code. By simply setting a watchpoint on a line of code, the next time a request on any of your servers hits that line of code, you get a snapshot of all the local variables, parameters, instance variables and a full stack trace. This works no matter how many instances you are running in production. There is zero setup up time and no complex configuration to enable. The debugger is ideal for use in production. There is no overhead for enabling the debugger on a project and when a watchpoint is hit very little noticeable performance impact is seen by your users.

  Screen Shot 2014-06-19 at 10.34.21 AM.png

Cloud Trace
Performance is an important feature of your service which directly correlates with end user satisfaction and retention. No one intends to build a slow service, but it can be extremely difficult to isolate the root cause of sluggishness when it happens. Especially when the issue hits only a fraction of your users.

Cloud Trace helps you visualize and understand the time spent by your application for request processing. This enables you to quickly identify and fix performance bottlenecks. You can even compare performance from release to release with a detailed report. You can leave Cloud Trace enabled in production because it has very little performance overhead.

In this screenshot, you can see we have investigated a particularly slow trace and we see a detailed breakdown of where the time is being spent. It looks like the problem could be these numerous sequential calls to Datastore, so maybe we should consider batching them.

Short Waterfall.png

So we go update our service to batch the Datastore calls, and deploy the updated service. Now we can use Cloud Trace to verify the fix.

  Selection_081.png As a developer, you can easily produce a report that shows the performance change in your service from one release to another. In the following report, the blue graph shows the performance without datastore batching and the orange graph shows the performance after releasing the change to use datastore batching. The X-axis of the graph represents the time taken (logarithmic scale) to service requests, and the left shift of the orange graph shows the significant performance gain due to Datastore batching.


Cloud Monitoring, Powered by Stackdriver

Cloud Monitoring provides rich dashboards and alerting capabilities that help developers find and fix performance problems quickly.

With minimal configuration and no separate infrastructure to maintain, Cloud Monitoring provides you with deep visibility into your Cloud Platform services. For example, you can use Cloud Monitoring dashboards to diagnose cases where your customers are reporting slow response times or errors accessing your applications:

Likewise, you can create alerting policies so that you are notified when key metrics, such as latency or error rates, pass a given threshold in the future:

You can configure alerts for any metric in the system, including those related to the performance of Cloud SQL databases, App Engine modules and versions, Pub/Sub topics and subscriptions, and Compute Engine VMs. With Compute Engine VMs, you can create alerts for both core system metrics (CPU, memory, etc.) and application services running in the VMs (Apache, Cassandra, MongoDB, etc.).

You can also create dashboards that make it easier to correlate metrics across services. For example, it takes a few clicks to create a dashboard that tracks key metrics for an App Engine module that connects to a set of Redis VMs running on Compute Engine:

Finally, you can create endpoint checks to monitor availability and response times for your end-user facing services. Endpoint checks are performed by probes in Oregon, Texas, Virginia, Amsterdam, and Singapore, enabling monitoring of latency from each of these five regions.

SSH to your VM instantly

Sometimes it is inevitable to connect directly to a VM to debug or fix a production issue. We know this can be a bit of a pain, especially when you are on the road, so now you can do that from just about anywhere. With our new browser based SSH client you can quickly and securely connect to any of your VMs from the Console. No need to install any SDK or tools. The best part is, this works from any desktop device with most major web browsers.

Screen Shot 2014-06-15 at 7.19.10 AM.png

Ready for a Spin? All of these features are just about ready for your applications. Stay tuned to this blog, we will post updates as they are more widely available.

Posted by Louis Gray, Googler

Get it on Google Play