Introducing Discovery Ad Performance Analysis
Posted by Manisha Arora, Nithya Mahadevan, and Aritra Biswas, gPS Data
Science team
Overview of Discovery Ads and need for Ad Performance
Analysis
Discovery ads, launched in May 2019, allow advertisers to easily extend their reach
of social ads users across YouTube, Google Feed and Gmail worldwide. They provide
brands a new opportunity to reach 3 billion people as they explore their interests and search
for inspiration across their favorite Google feeds (YouTube, Gmail, and Discover) -- all with
a single campaign. Learn more about Discovery ads here.
Due to these uniquenesses, customers need a
data driven method to identify textual & imagery elements in Discovery Ad copies that
drive
Interaction Rate of their Discovery Ad campaigns, where
interaction is defined as the main user action associated with an ad format—clicks and swipes
for text and Shopping ads, views for video ads, calls for call extensions, and so on.
Interaction Rate = interaction / impressions
“Customers need a data driven method to
identify textual & imagery elements in Discovery Ad copies that drive Interaction Rate
of their campaigns.”
- Manisha Arora, Data
Scientist
Our analysis approach:
The Data Science team at Google is investing in a machine
learning approach to uncover insights from complex unstructured data and provide machine
learning based recommendations to our customers. Machine Learning helps us study what works in
ads at scale and these insights can greatly benefit the advertisers.
We
follow a six-step based approach for Discovery Ad Performance Analysis:
- Understand Business Goals
- Build
Creative Hypothesis
- Data Extraction
- Feature
Engineering
- Machine Learning Modeling
- Analysis
& Insight Generation
To begin
with, we work closely with the advertisers to understand their business goals, current ad
strategy, and future goals. We closely map this to industry insights to draw a larger picture
and provide a customized analysis for each advertiser. As a next step, we build hypotheses
that best describe the problem we are trying to solve. An example of a hypothesis can be -”Do
superlatives (words like “top”, “best”) in the ad copy drive
performance?”
“Machine Learning helps us
study what works in ads at scale and these insights can greatly benefit the
advertisers.”
- Manisha Arora,
Data
Scientist
Once we have a hypothesis we are working towards, the
next step is to deep-dive into the technical analysis.
Data Extraction &
Pre-processing
Our initial dataset
includes raw ad text, imagery, performance KPIs & target audience details from
historic ad campaigns in the industry. Each Discovery ad contains two text assets (Headline
and Description) and one image asset. We then apply ML to extract
text and
image
features from these assets.
Text Feature Extraction
We apply NLP to extract the text features from the ad text. We
pass the raw text in the ad headline & description through
Google Cloud’s Language
API which parses the raw text into our feature set: commonly used keywords,
sentiments etc.
Example:
Image Feature
Extraction
We apply Image Processing to
extract image features from the ad copy imagery. We pass the raw images through
Google Cloud’s Vision
API & extract image components including objects, person, background,
lighting etc.
Following are the holistic set of features that are extracted from
the ad content:
Feature
Design
Text Feature Design
There are two types of text features being included in DisCat:
1. Generic text feature
a. These are features
returned by Google Cloud’s Language API including sentiment, word / character count, tone
(imperative vs indicative), symbols, most frequent words and so
on.
2. Industry-specific
value propositions
a. These are features that only apply to a specific
industry (e.g. finance) that are manually curated by the data science developer in
collaboration with specialists and other industry experts.
- For example, for the
finance industry, one value proposition can be “Price Offer”. A list of keywords / phrases
that are related to price offers (e.g. “discount”, “low rate”, “X% off”) will be curated based
on domain knowledge to identify this value proposition in the ad copies. NLP techniques (e.g.
wordnet synset) and manual examination will be used to make sure this
list is inclusive and
accurate.
Image
Feature Design
Like the text features,
image features can largely be grouped into two categories:
1.
Generic image features
a. These features apply to all images
and include the color profile, whether any logos were detected, how many human faces are
included, etc.
b. The face-related features also include
some advanced aspects: we look for prominent smiling faces looking directly at the camera, we
differentiate between individuals vs. small groups vs. crowds,
etc.
2. Object-based
features
a. These features are based on the list of
objects and labels detected in all the images in the dataset, which can often be a massive
list including generic objects like “Person” and specific ones like particular dog
breeds.
b. The biggest challenge here is dimensionality: we have
to cluster together related objects into logical themes like natural vs. urban
imagery.
c. We currently have a hybrid approach to this problem: we
use unsupervised clustering approaches to create an initial clustering, but we manually revise
it as we inspect sample images. The process
is:
- Extract object and label names (e.g. Person, Chair, Beach, Table) from the
Vision API output and filter out the most uncommon objects
- Convert these
names to 50-dimensional semantic vectors using a Word2Vec model trained on the Google News
corpus
- Using PCA, extract the top 5 principal components from the semantic vectors. This
step takes advantage of the fact that each Word2Vec neuron encodes a set of commonly adjacent
words, and different sets represent different axes of similarity and should be weighted
differently
- Use an unsupervised clustering algorithm, namely either
k-means or DBSCAN, to find semantically similar clusters of words
- We are
also exploring augmenting this approach with a combined distance metric:
d(w1, w2) = a * (semantic distance) + b * (co-appearance
distance)
where the latter is a Jaccard distance
metric
Each of these components represents a choice the advertiser made when creating the
messaging for an ad. Now that we have a variety of ads broken down into components, we can
ask: which components are associated with ads that perform well or not so well?
We use a fixed
effects1 model
to control for unobserved differences in the context in which different ads
were served. This is because the features we are measuring are observed multiple times in
different contexts i.e. ad copy, audience groups, time of year & device in which ad is
served.
The trained model will seek to estimate the impact of individual keywords,
phrases & image components in the discovery ad copies. The model form
estimates Interaction Rate (denoted as ‘IR’ in the following formulas) as a function of
individual ad copy features + controls:
We use
ElasticNet to spread the effect of features in presence of
multicollinearity & improve the explanatory power of the model:
“Machine Learning model estimates the impact of individual keywords, phrases, and
image components in discovery ad
copies.”
- Manisha Arora,
Data
Scientist
Outputs & Insights
Outputs from the machine learning model help us determine the significant
features. Coefficient of each feature represents the percentage point effect on CTR.
In other words, if the mean CTR without feature is X% and the feature ‘xx’
has a coeff of Y, then the mean CTR with feature ‘xx’ included will be (X + Y)%. This can help
us determine the expected CTR if the most important features are included as part of the ad
copies.
Key-takeaways (sample
insights):
We analyze keywords & imagery
tied to the unique value propositions of the product being advertised. There are 6 key value
propositions we study in the model. Following are the sample insights we have received from
the analyses:
Shortcomings:
Although insights from DisCat are quite accurate and
highly actionable, the moel does have a few limitations:
1. The
current model does not consider groups of keywords that might be driving ad performance
instead of individual keywords (Example - “Buy Now” phrase instead of “Buy” and “Now”
individual keywords).
2. Inference and predictions are based on
historical data and aren’t necessarily an indication of future
success.
3. Insights are based on industry insights and may need to
be tailored for a given advertiser.
DisCat breaks down exactly
which features are working well for the ad and which ones have scope for improvement. These
insights can help us identify high-impact keywords in the ads which can then be used to
improve ad quality, thus improving business outcomes. As next steps, we recommend testing out
the new ad copies with experiments to provide a more robust analysis.
Google
Ads A/B testing feature also allows you to create and run experiments to test these
insights in your own campaigns.
Summary
Discovery
Ads are a great way for advertisers to extend their social outreach to millions of people
across the globe. DisCat helps break down discovery ads by analyzing text and images
separately and using advanced ML/AI techniques to identify key aspects of the ad that drives
greater performance. These insights help advertisers identify room for growth, identify
high-impact keywords, and design better creatives that drive business outcomes.
Acknowledgement
Thank you to Shoresh Shafei and Jade Zhang for their contributions. Special
mention to Nikhil Madan for facilitating the publishing of this blog.
Notes