Google App Engine Basic Text Search
    
    
    
    
    This post is part of the Who's
      @ Google I/O, a series of blog posts that give a closer look at developers who'll be
      speaking or demoing at Google
      I/O. This guest post is written by Brian Dorry from LTech who is demoing as part of
      the Developer
      Sandbox.Having trouble implementing search on your
      App Engine Data Store? Try this technique for a basic search until official full text support
      is ready.
Since adding 
Google App Engine to our technical tool
      belt in 2008, we at 
LTech have utilized the
      platform for a wide range of products and customer solutions. It is cost effective, easy to
      use, and will automatically scale your application on Google's immense infrastructure. Knowing
      your applications will be running on the same technologies that Google's own systems take
      advantage of make it the easy choice again and again.
From our own
      experiences and participation in the developer community, the biggest complaint we hear is the
      lack of a full text search in the datastore. Google has marked this issue as "Started", but
      has not announced a release date yet, so alternative approaches are still going to be in play
      for the short term. We are big fans of Lucene (
http://lucene.apache.org/), an open source
      indexing and search engine, but without the ability to save the index file to disk, it becomes
      a non-starter.
We need a quick, non-CPU taxing solution that still
      takes advantage of the Google infrastructure.
ProblemTaking advantage of the App Engine Datastore, we
      can issue an inequalities query to perform a basic "starts with" search. This can be a good
      solution for searching users, tags, and domains and works well for implementing a search box
      auto-complete feature.
SolutionOur
      example solution uses 
JDO to generate a
      query that instructs the DataStore to return all records that start with the search string.
      This is accomplished by issuing a greater than or equal condition against the search term, and
      a less than condition against the search input concatenated with the unicode replacement
      character ('\ufffd'). The resulting query limits results to items that start with the search
      input, as well as any other unicode characters that follow.
This code
      uses JDO behind the scenes, but this trick will work with straight 
GQL as well. Let's take a look at the
      sample:
import java.util.List;
import javax.jdo.PersistenceManager;
import javax.jdo.Query;
(...)
public static List searchGreeting(String query) {
 // let's clean up the input
 query = ( query != null ?
      query.toLowerCase() : "").trim();
 PersistenceManager pm =
      PMF.get().getPersistenceManager();
 Query q = pm.newQuery(Greeting.class);
 // set the filter and params
 q.setFilter("content >= :1
      && content < :2");
 // run query with param values and
      return results
 return (List) q.execute(query, (query + "\ufffd"));
}
This code snippet is going to search the
      JDO defined Employee entity on the name column and return the full Employee payload for each
      match. Let's focus on the last two lines of code.
q.setFilter("name >= :1 && name <
      :2");
Here we set up the inequality. We are asking the data
      store to return all matches where name is between a set of two values. But how does that
      define a search?
return (List)
      q.execute(query, (query + "\ufffd"));
When we set our
      parameters, we pass the same query value to both with an extra character on the end of the
      second one. This is essentially telling the data store to return all records that start with
      the query term. In terms of sets, the first part of the query returns the set of all words
      greater than the query term, including words that don't even start with the query term. The
      second part of the query returns the set of all words less than the query term including any
      that start with the query term. The intersection of the two sets is the search result for all
      words starting with the search term.
This simple to implement technique
      will solve many basic search problems until a full text solution is available. It will work
      outside of JDO as well with regular GQL statements. For a python implementation, please see
      our friend 
Graeme's
      blog.
Posted by Brian Dorry, LTech team