Previous Section  < Day Day Up >  Next Section

9.7. Understanding the Google API Response

While the Google API grants you programmatic access to Google's Web index, it doesn't provide all the functionality available through the Google.com web site's search interface.

9.7.1. Can Do

The Google API, in addition to simple keyword queries, supports the following ["Special Syntaxes" in Chapter 1]:

site:

daterange:

intitle:

inurl: 

allintext:

allinlinks:

filetype:

info:

link:

related: 

cache:

9.7.2. Can't Do

The Google API does not support these special syntaxes:

phonebook: 

rphonebook:

bphonebook:

stocks:

While queries of this sort provide no individual results, aggregate result data is sometimes returned and can prove rather useful. googly.php [Hack #96], for instance, displays the number of results (estimatedTotalResultsCount).

9.7.3. The 10-Result Limit

While searches through the standard Google.com home page can be tuned ["Setting Preferences" in Chapter 1] to return 10, 20, 30, 50, or 100 results per page, the Google Web API limits the number to 10 per query. This doesn't mean, mind you, that the rest are not available to you, but it takes a wee bit of creative programming entailing looping through results, 10 at a time [Hack #95] .

9.7.4. What's in the Results

The Google API provides both aggregate and per-result data in its result set.

9.7.4.1 Aggregate data

The aggregate data, information on the query itself and on the kinds and number of results that query turned up, consists of:


<documentFiltering>

A Boolean (true/false) value specifying whether or not results were filtered for very similar results or those that come from the same web host.


<searchComments>

Any commentary (e.g., a note about stop words being removed) Google might throw in that would usually be displayed just beneath the search box on a typical Google results page.


<estimatedTotalResultsCount>

An estimate of how many results might be found for your search in the Google index. This number may vary from invocation to invocation, moment to moment—thus the "estimated" proviso.


<estimateIsExact>

Google may sometimes be sure of its estimatedTotalResultsCount, in which case estimateIsExact will be set to TRue.


<resultElements>

The individual results themselves, returned as an array.


<searchQuery>

Your Google query, right back at you.


<startIndex>

The index of the first result in the current array of results. Assuming your query asked for a start of 0, the first result will have a startIndex of 1. If you asked for a start of 25, startIndex would be 26. Yes, I know it's confusing that start is zero-based, while startIndex is one-based, but that's the way the cookie crumbles, I'm afraid.


<endIndex>

The index of the last result in the current array of results. This is always whatever you set as start + maxResults in your query, unless the total is greater than the number of estimatedTotalResultsCount, in which case it is simply estimatedTotalResultsCount.


<searchTips>

May provide suggestions on better using Google, suitable for displaying to the end user.


<directoryCategories>

A list of directory categories, if any, associated with the query


<searchTime>

The time spent by the Google server (in seconds) on your search.

9.7.4.2 Individual search result data

The "guts" of a search result—the URLs, page titles, and snippets—are returned in a <resultElements> list. Each result consists of the following elements:


<summary>

The Google Directory summary, if available.


<URL>

The search result's URL, consistently starts with http://.


<snippet>

A brief excerpt of the page with query terms highlighted in bold (HTML <b> </b> tags).


<title>

The page title in HTML.


<cachedSize>

The size in kilobytes (K) of the Google-cached version of the page, if available.


<relatedInformationPresent>

If set to 1, means a related: search on the current result's URL will turn up something of use.


<hostName>

When you set filter to TRue in your query, only two results from the same hostname are included in your set of results. In the second of these results, hostName is set to the host from which the result came.


<directoryTitle>

The title under which this result appears in the Google Directory (http://directory.google.com, a.k.a. the Open Directory Project) if it is in the directory at all.


<directoryCategory>

The Google Directory category, if any, in which you'll find this result. <directoryCategory> consists of <fullViewableName>, the name given to the category itself, and <specialEncoding>, any special encoding assigned to the directory category at hand.

You no doubt notice the conspicuous absence of PageRank. Google does not make PageRank available through anything but the official Google Toolbar [Hack #60] . You can get a general idea of a page's popularity by looking over the popularity bars in the Google Directory.

    Previous Section  < Day Day Up >  Next Section