Posted by Alex Fischer, Software
Engineer, Google Search.
TL;DR: Today, we're adding a feature to the AMP integration in Google Search
that allows users to access, copy, and share the canonical URL of an AMP
document. But before diving deeper into the news, let's take a step back to
elaborate more on URLs in the AMP world and how they relate to the speed
benefits of AMP.
What's in a URL? On the web, a lot - URLs and origins represent, to some
extent, trust and ownership of content. When you're reading a New York Times
article, a quick glimpse at the URL gives you a level of trust that what you're
reading represents the voice of the New York Times. Attribution, brand, and
ownership are clear.
Recent product launches in different mobile apps and the recent launch of AMP
in Google Search have blurred this line a little. In this post, I'll first
try to explain the reasoning behind some of the technical decisions we made and
make sense of the different kinds of AMP URLs that exist. I'll then outline
changes we are making to address the concerns
around
URLs.
To start with, AMP documents have three different kinds of URLs:
Original URL: The publisher's document written in the AMP format.
http://www.example.com/amp/doc.html
AMP Cache URL: The document served through an AMP Cache (e.g., all AMPs
served by Google are served through the Google AMP Cache). Most
users will never see this URL.
https://www-example-com.cdn.ampproject.org/c/www.example.com/amp/doc.html
Google AMP Viewer URL: The document displayed in an AMP viewer (e.g., when
rendered on the search result page).
https://www.google.com/amp/www.example.com/amp.doc.html
Although having three different URLs with different origins for essentially the
same content can be confusing, there are two main reasons why these different
URLs exist: caching and pre-rendering. Both are large contributors to AMP's
speed, but require new URLs and I will elaborate on why that is.
AMP Cache URLs
Let's start with AMP Cache URLs. Paul Bakaus, a Google Developer Advocate for
AMP, has an excellent post describing why AMP
Caches exist. Paul's post goes into great detail describing the benefits of
AMP Caches, but it doesn't quite answer the question why they require new URLs.
The answer to this question comes down to one of the design principles of AMP:
build for easy adoption. AMP tries to solve some of the problems of the mobile
web at scale, so its components must be easy to use for everyone.
There are a variety of options to get validation, proximity to users, and other
benefits provided by AMP Caches. For a small site, however, that doesn't manage
its own DNS entries, doesn't have engineering resources to push content through
complicated APIs, or can't pay for content delivery networks, a lot of these
technologies are inaccessible.
For this reason, the Google AMP Cache works by means of a simple URL
"transformation." A webmaster only has to make their content available at some
URL and the Google AMP Cache can then cache and serve the content through
Google's world-wide infrastructure through a new URL that mirrors and transforms
the original. It's as simple as that. Leveraging an AMP Cache using the original
URL, on the other hand, would require the webmaster to modify their DNS records
or reconfigure their name servers. While some sites do just that, the URL-based
approach is easier to use for the vast majority of sites.
AMP Viewer URLs
In the previous section, we learned about Google AMP Cache URLs -- URLs that
point to the cached version of an AMP document. But what about
www.google.com/amp URLs? Why are they needed? These are "AMP
Viewer" URLs and they exist because of pre-rendering.
AMP's built-in support for privacy and resource-conscientious pre-rendering is
rarely talked about and often misunderstood. AMP documents can be pre-rendered
without setting off a cascade of resource fetches, without hogging up users' CPU
and memory, and without running any privacy-sensitive analytics code. This works
regardless of whether the embedding application is a mobile web page or a native
application. The need for new URLs, however, comes mostly from mobile web
implementations, so I am using Google's mobile search result page (SERP) as an
illustrative example.
How does pre-rendering
work?
When a user performs a Google search that returns AMP-enabled results, some of
these results are pre-rendered behind the scenes. When the user clicks on a
pre-rendered result, the AMP page loads instantly.
Pre-rendering works by loading a hidden iframe on the embedding page (the search
result page) with the content of the AMP page and an additional parameter that
indicates that the AMP document is only being pre-rendered. The JavaScript
component that handles the lifecycle of these iframes is called "AMP Viewer".
The AMP Viewer pre-renders an AMP document in a hidden
iFrame.
The user's browser loads the document and the AMP runtime and starts rendering
the AMP page. Since all other resources, such as images and embeds, are managed
by the AMP runtime, nothing else is loaded at this point. The AMP runtime may
decide to fetch some resources, but it will do so in a resource and privacy
sensible way.
When a user clicks on the result, all the AMP Viewer has to do is show the
iframe that the browser has already rendered and let the AMP runtime know that
the AMP document is now visible.
As you can see, this operation is incredibly cheap - there is no network
activity or hard navigation to a new page involved. This leads to a near-instant
loading experience of the result.
Where do google.com/amp URLs come
from?
All of the above happens while the user is still on the original page (in our
example, that's the search results page). In other words, the user hasn't gone
to a different page; they have just viewed an iframe on the same page and so the
browser doesn't change the URL.
We still want the URL in the browser to reflect the page that is displayed on
the screen and make it easy for users to link to. When users hit refresh in
their browser, they expect the same document to show up and not the underlying
search result page. So the AMP viewer has to manually update this URL. This
happens using the History API. This API allows the AMP Viewer to update the
browser's URL bar without doing a hard navigation.
The question is what URL the browser should be updated to. Ideally, this would
be the URL of the result itself (e.g.,
www.example.com/amp/doc.html); or the AMP Cache URL (e.g.,
www-example-com.cdn.ampproject.org/www.example.com/amp/doc.html).
Unfortunately, it can't be either of those. One of the main restrictions of the
History API is that the new URL must be on the same origin as the original URL
(reference).
This is enforced by browsers (for security
reasons), but it means that in Google Search, this URL has to be on the
www.google.com origin.
Why do we show a header
bar?
The previous section explained restrictions on URLs that an AMP Viewer has to
handle. These URLs, however, can be confusing and misleading. They can open up
the doors to phishing attacks. If an AMP page showed a login page that looks
like Google's and the URL bar says www.google.com, how would a user
know that this page isn't actually Google's? That's where the need for
additional attribution comes in.
To provide appropriate attribution of content, every AMP Viewer must make it
clear to users where the content that they're looking at is coming from. And one
way of accomplishing this is by adding a header bar that displays the "true"
origin of a page.
What's next?
I hope the previous sections made it clear why these different URLs exist and
why there needs to be a header in every AMP viewer. We have heard how you feel
about this approach and the importance of URLs. So what next? As you know, we
want to be thoughtful in what we do and ensure that we don't break the speed and
performance users expect from AMP pages.
Since the launch of AMP
in Google Search in Feb 2016, we have taken the following steps:
All Google URLs (i.e., the Google AMP cache URL and the Google AMP viewer
URL) reflect the original source of the content as best as possible:
www.google.com/amp/www.example.com/amp/doc.html.
When users scroll down the page to read a document, the AMP viewer header
bar hides, freeing up precious screen real-estate.
When users visit a Google AMP viewer URL on a platform where the viewer is
not available, we redirect them to the canonical page for the
document.
In addition to the above, many users have requested a way to access, copy, and
share the canonical URL of a document. Today, we're adding support for this
functionality in form of an anchor button in the AMP Viewer header on Google
Search. This feature allows users to use their browser's native share
functionality by long-tapping on the link that is displayed.
In the coming weeks, the Android Google app will share the original URL of a
document when users tap on the app's share button. This functionality is already
available on the iOS Google app.
Lastly, we're working on leveraging upcoming web platform APIs that allow us to
improve this functionality even further. One such API is the Web
Share API that would allow AMP viewers to invoke the platform's native
sharing flow with the original URL rather than the AMP viewer URL.
We as Google have every intention in making the AMP experience as good as we can
for both, users and publishers. A thriving ecosystem is very important to us and
attribution, user trust, and ownership are important pieces of this ecosystem. I
hope this blog post helps clear up the origin of the three URLs of AMP
documents, their role in making AMP fast, and our efforts to further improve the
AMP experience in Google Search. Lastly, an ecosystem can only flourish with
your participation: give us feedback
and get involved with AMP.