Recommended strategies and best practices for designing and developing games and stories on Google Assistant

APR 26, 2021

Posted by Wally Brill and Jessica Dene Earley-Cha

Illustration of pink car collecting coins

Since we launched Interactive Canvas, and especially in the last year we have been helping developers create great storytelling and gaming experiences for Google Assistant on smart displays. Along the way we’ve learned a lot about what does and doesn’t work. Building these kinds of interactive voice experiences is still a relatively new endeavor, and so we want to share what we've learned to help you build the next great gaming or storytelling experience for Assistant.

Here are three key things to keep in mind when you’re designing and developing interactive games and stories. These three were selected from a longer list of lessons learned (stay tuned to the end for the link for the 10+ lessons) because they are dependent on Action Builder/SDK functionality and can be slightly different for the traditional conversation design for voice only experiences.

1. Keep the Text-To-Speech (TTS) brief

Text-to-speech, or computer generated voice, has improved exponentially in the last few years, but it isn’t perfect. Through user testing, we’ve learned that users (especially kids) don’t like listening to long TTS messages. Of course, some content (like interactive stories) should not be reduced. However, for games, try to keep your script simple. Wherever possible, leverage the power of the visual medium and show, don’t tell. Consider providing a skip button on the screen so that users can read and move forward without waiting until the TTS is finished. In many cases the TTS and text on a screen won’t always need to mirror each other. For example the TTS may say "Great job! Let's move to the next question. What’s the name of the big red dog?" and the text on screen may simply say "What is the name of the big red dog?"


You can provide different audio and screen-based prompts by using a simple response, which allows different verbiage in the speech and text sections of the response. With Actions Builder, you can do this using the node client library or in the JSON response. The following code samples show you how to implement the example discussed above:

      - first_simple:
      - speech: Great job! Let's move to the next question. What’s the name of the big red dog?
      text: What is the name of the big red dog?

Note: implementation in YAML for Actions Builder

app.handle('yourHandlerName', conv => {
      conv.add(new Simple({
      speech: 'Great job! Let\'s move to the next question. What’s the name of the big red dog?',
      text: 'What is the name of the big red dog?'

Note: implementation with node client library

2. Consider both first-time and returning users

Frequent users don't need to hear the same instructions repeatedly. Optimize the experience for returning users. If it's a user's first time experience, try to explain the full context. If they revisit your action, acknowledge their return with a "Welcome back" message, and try to shorten (or taper) the instructions. If you noticed the user has returned more than 3 or 4 times, try to get to the point as quickly as possible.

An example of tapering:


You can check the lastSeenTime property in the User object of the HTTP request. The lastSeenTime property is a timestamp of the last interaction with this particular user. If this is the first time a user is interacting with your Action, this field will be omitted. Since it’s a timestamp, you can have different messages for a user who’s last interaction has been more than 3 months, 3 weeks or 3 days. Below is an example of having a default message that is tapered. If the lastSeenTime property is omitted, meaning that it's the first time the user is interacting with this Action, the message is updated with the longer message containing more details.

app.handle('greetingInstructions', conv => {
      let message = 'Make up words from the jumbled letters. Ready?';
      if (!conv.user.lastSeenTime) {
      message = 'Just say words you can make from the letters provided. Are you ready to begin?';

Note: implementation with node client library

3. Support strongly recommended intents

There are some commonly used intents which really enhance the user experience by providing some basic commands to interact with your voice app. If your action doesn’t support these, users might get frustrated. These intents help create a basic structure to your voice user interface, and help users navigate your Action.


Actions Builder & Actions SDK support System Intents that cover a few of these use case which contain Google support training phrase:

For the remaining intents, you can create User Intents and you have the option of making them Global (where they can be triggered at any Scene) or add them to a particular scene. Below are examples from a variety of projects to get you started:

So there you have it. Three suggestions to keep in mind for making amazing interactive games and story experiences that people will want to use over and over again. To check out the full list of our recommendations go to the Lessons Learned page.

Thanks for reading! To share your thoughts or questions, join us on Reddit at r/GoogleAssistantDev.

Follow @ActionsOnGoogle on Twitter for more of our team's updates, and tweet using #AoGDevs to share what you’re working on. Can’t wait to see what you build!