Simulating a neural operating system with Gemini 2.5 Flash-Lite

25 DE JUNIO DE 2025
D Shin Senior Staff Software Engineer
Ali Eslami Research Scientist
Madhavi Sewak Technical Director

In traditional computing, user interfaces are pre-defined. Every button, menu, and window is meticulously coded by developers. But what if an interface could be generated in real time, adapting to a user's context with each interaction? We explored this question by building a research prototype (view demo app in Google AI Studio) for a generative, infinite computer experience.

Our prototype simulates an operating system where each screen is generated on the fly by a large language model. It uses Gemini 2.5 Flash-Lite, a model whose low latency is critical for creating a responsive interaction that feels instantaneous. Instead of navigating a static file system, the user interacts with an environment that the model builds and rebuilds with every click. This post outlines the core technical concepts behind this prototype.

Conditioning the model for on-the-fly UI generation

To generate a UI on-the-fly, we need to provide the model with a clear structure and context for each request. We engineered our prompt by dividing the model's input into two parts: a "UI constitution" and a "UI interaction".

The UI constitution is a system prompt that contains a fixed set of rules for UI generation. These rules define consistent elements like the OS-level styling, the home screen format, and logic for embedding elements like maps.

The UI interaction is a JSON object that captures the user's most recent action, such as a mouse click on an icon. This object serves as the specific query that prompts the model to generate the next screen. For example, clicking on a “Save Note” icon within the Notepad app may generate an object as the following:

{
  // `id`: The unique ID from the button's `data-interaction-id` attribute.
  id: 'save_note_action',

  // `type`: The interaction type from `data-interaction-type`.
  type: 'button_press',

  // `value`: Because the button has a `data-value-from` attribute, the system
  // retrieves the content from the textarea with the ID 'notepad_main_textarea'.
  value: 'Meeting notes\n- Discuss Q3 roadmap\n- Finalize budget',

  // `elementType`: The HTML tag of the element that was clicked.
  elementType: 'button',

  // `elementText`: The visible text inside the button.
  elementText: 'Save Note',

  // `appContext`: The ID of the application the user is currently in.
  // This comes from the `activeApp` state in `App.tsx`.
  appContext: 'notepad_app'
}
JSON

This two-part, context-setting approach allows the model to maintain a consistent look- and- feel while generating novel screens based on specific, real-time user inputs.


Using interaction tracing for contextual awareness

A single interaction provides immediate context, but a sequence of interactions tells a richer story. Our prototype can use a trace of the past N interactions to generate a more contextually relevant screen. For example, the content generated within a calculator app could differ depending on whether the user previously visited a shopping cart or a travel booking app. By adjusting the length of this interaction trace, we can tune the balance between contextual accuracy and UI variability.


Streaming the UI for a responsive experience

To make the system feel fast, we can't wait for the model to generate the entire UI screen before rendering. Our prototype leverages model streaming and the browser's native parser to implement progressive rendering. As the model generates HTML code in chunks, we continuously append it to our component's state. React then re-renders the content, allowing the browser to display valid HTML elements as soon as they are received. For the user, this creates the experience of an interface materializing on screen almost instantly.


Achieving statefulness with a generative UI graph

By default, our model generates a new screen from scratch with each user input. This means visiting the same folder twice could produce entirely different contents. Such non-deterministic, stateless experience may not always be preferred given that the GUI we are used to is static. To introduce statefulness to our prototype, our demo system has an option to build an in-memory cache for modeling a session-specific UI graph. When a user navigates to a screen that has already been generated, the system serves the stored version from the graph, without querying Gemini again. When the user requests a new screen not in cache, the UI graph grows incrementally. This method provides state without compromising the quality of the generative output, which can be a side effect of simply lowering the model's sampling temperature.


Potential applications for just-in-time generative UI

While this is a conceptual prototype, the underlying framework could be applied to more practical use cases.

  • Contextual shortcuts: A system could observe a user's interaction patterns and generate an ephemeral UI panel to accelerate their task. For instance, as the user is comparing flights across multiple websites, a floating widget could just-in-time appear with dynamically generated buttons for comparing prices or booking a flight directly, saving the user several steps.

  • “Generative mode” in existing apps: Developers could add a "generative mode" to their applications. In Google Calendar, for example, a user could activate this mode to see just-in-time UIs. When moving a calendar invite, instead of a standard dialog, the system could generate a screen presenting the best alternative times as a series of directly selectable buttons based on attendees' schedules. This would create a hybrid experience where generative and static UI elements coexist seamlessly in one application.


Exploring novel concepts like this helps us understand how new paradigms for human-computer interaction are evolving. As models continue to get faster and more capable, we believe generative interfaces represent a promising area for future research and development.