Previous Section  < Day Day Up >  Next Section

Hack 22. Capture Google Results in a Google Box

Add a little box of Google results to any page in your web site.

A Google box is a small HTML snippet that shows Google search results for whatever you're searching for. You might wish to display on your web page a box of pages similar to yours, pages that link to yours, or the top hits for a search that might be of interest to your readers.

Google boxes as a concept—the idea of taking a shortened version of Google results and integrating them into a web page or some other place—are not new. In fact, they're on their way to becoming ubiquitous when it comes to weblog and content management software. The Google box is easy to implement and was one of the first examples of Google API usage. As such, it enjoys the position of proto-application: a lot of developers whip up a Google box just to see if they can. Do a Google search for Google Box to see some other examples of Google boxes for different languages and applications.

What goes in a Google box, anyway? Why would anybody want to integrate them into a web page?

It depends on the page. Putting a Google box that searches for your name onto a weblog provides a bit of an ego boost and can give a little more information about you without seeming like bragging (yeah, right). If you have a topic-specific page, set up a Google box that searches for the topic (the more specific, the better the results). And if you've got a general news-type page, consider adding a Google box for the news topic. Google boxes can go pretty much anywhere, with Google updating its index often enough that the content of a Google box stays fresh.

2.4.1. The Code

Here's a classic piece of Perl code to produce a Google box as a regular text file filled with garden-variety HTML code, suitable for incorporating into any web page.

#!/usr/local/bin/perl

# google_box.pl

# A classic Google box implementation.

# Usage: perl google_box.pl <query> <# results>

     

# Your Google API developer's key.

my $google_key='insert key here';

     

# Location of the GoogleSearch WSDL file.

my $google_wdsl = "./GoogleSearch.wsdl";

     

use strict;

     

use SOAP::Lite;

     

# Bring in those command-line arguments.

@ARGV == 2

  or die "Usage: perl googlebox.pl <query> <# results>\n";

my($query, $maxResults) = @ARGV;

$maxResults = 10 if ($maxResults < 1 or $maxResults > 10);

     

# Create a new SOAP::Lite instance, feeding it GoogleSearch.wsdl.

my $google_search = SOAP::Lite->service("file:$google_wdsl");

     

# Query Google.

my $results = $google_search -> 

  doGoogleSearch(

    $google_key, $query, 0, $maxResults, "false", "",  

    "false", "", "latin1", "latin1"

  );

     

# No results?

@{$results->{resultElements}} or die "no results";

     

print join "\n",

  map( { 

    qq{<a href="$_->{URL}">} .

    ($_->{title} || $_->{URL}) . 

    qq{</a> <br />} 

  } @{$results->{resultElements}} );

Save the code to a file called google_box.pl. Be sure to replace insert key here in the seventh line with your personal Google API key.

2.4.2. Running the Hack

This Google box takes two bits of information on the command line ["How to Run the Hacks" in the Preface]: the query you want to run and maximum number of results you'd prefer (up to 10). If you don't provide the number of results, the Google box will default to 10. Run it as follows:

% perl google_box.pl " query "  # of results

where query is the search query you'd like to run against Google and # of results is the maximum number of results you want it to return.

This will print the results to the screen. To save them to a text file for inclusion in your web pages, specify the name of a file to save the results to, like so:

% perl google_box.pl " query "  # of results  >  google_box.html

You can leave out specifying # of results and the script will default to 10 results in your Google box.


2.4.3. The Results

Here's a sample Google box for "camel book", referring to O'Reilly's popular Programming Perl title:

<a href="http://www.oreilly.com/catalog/pperl2/">oreilly.com -- 

Online Catalog:Programming Perl, 2nd Edition</a> <br />

<a href="http://www.oreilly.com/catalog/pperl3/">oreilly.com -- 

Online Catalog:Programming Perl, 3rd Edition</a> <br />

<a href="http://www.oreilly.com/catalog/pperl2/noframes.html">Programming 

Perl, 2nd Edition</a> <br />

<a href="http://www.tuxedo.org/~esr/jargon/html/entry/Camel-Book.html">Camel Book</a> <br />

<a href="http://www.cise.ufl.edu/perl/camel.html">The Camel Book<a> <br />

2.4.4. Integrating a Google Box

When you incorporate a Google box into your web page, you'll have two considerations: refreshing the content of the box regularly and integrating the content into your web page. For refreshing the content of the box, you'll need to run regularly the program using something like cron under Unix or the Windows Scheduler.

To include the content on your web page, Server Side Includes (SSI) is always rather effective. With SSI, including a Google box takes little more than something like this:

<!-- #include virtual="./google_box.html" -->

For more information on using Server Side Includes, check out the NCSA SSI Tutorial (http://hoohoo.ncsa.uiuc.edu/docs/tutorials/includes.html), or search Google for Server Side Includes Tutorial.


Google boxes are a nice addition to your web pages, whether you run a weblog or a news site. But for many Google box searches, the search results won't change that often, especially for more common search words.

2.4.5. Making the Google Box Timely

As you might remember, Google has a daterange: search syntax available. This version of the Google box takes advantage of the daterange: Hack #16] syntax, allowing you to specify how many days back you want your query to run. If you don't provide a number, the default is 1, and there's no maximum. I wouldn't go back much further than a month or so. The fewer days back you go, the more often the results in the Google box will change.

You'll need the Julian::Day module to get this hack rolling (http://search.cpan.org/search?query=time%3A%3Ajulianday).


2.4.5.1 The code

The code is essentially identical to that of the classic Google box, save the additional bits to accept and deal with a date range on the command line and build a daterange: query, called out in bold:

#!/usr/local/bin/perl

# timebox.pl

# A time-specific Google box.

# Usage: perl timebox.pl <query> <# results> <# days back>

     

# Your Google API developer's key.

my $google_key='insert key here';

     

# Location of the GoogleSearch WSDL file.

my $google_wdsl = "./GoogleSearch.wsdl";

     

use strict;

     

use SOAP::Lite;

use Time::JulianDay;

     

# Bring in those command-line arguments.

@ARGV == 2 

  or die "Usage: perl timebox.pl <query> <# results> <# days back>\n";

my($query, $maxResults, $daysBack) = @ARGV;

$maxResults = 10 if ($maxResults < 1 or $maxResults > 10);

$daysBack = 1 if $daysBack <= 0;

     

# Figure out when yesterday was in Julian days

my $yesterday = int local_julian_day(time) - $daysBack;

     

# Create a new SOAP::Lite instance, feeding it GoogleSearch.wsdl.

my $google_search = SOAP::Lite->service("file:$google_wdsl");

     

# Query Google.

my $results = $google_search -> 

  doGoogleSearch(

    $google_key, "$query daterange:$yesterday-$yesterday", 0, 

    $maxResults, "false", "",  "false", "", "latin1", "latin1"

  );

     

# No results?

@{$results->{resultElements}} or die "no results";

     

print join "\n",

  map( { 

    qq{<a href="$_->{URL}">} .

    ($_->{title} || $_->{URL}) . 

    qq{</a> <br />} 

  } @{$results->{resultElements}} );

Save the code to a text file named timebox.pl. And, again, don't forget to replace insert key here with your Google API key.

2.4.5.2 Running the hack

You'll have to provide three bits of information on the command line: the query you want to run, the maximum number of results you'd prefer (up to 10), and the number of days back that Google should consider:

% perl timebox.pl "query" # of results # days back

Replace query with your search query, # of results with the number of results you'd like (up to 10), and # days back with the number of days back you'd like to search for results.

Again, to send the results to a text file rather than the screen, call the script like this:

% perl timebox.pl "query" # of results # days back  > google_box.html

You can leave out specifying # of results and # days back and the script will default to 10 results and one day back, respectively.


2.4.5.3 The results

Here's a sample Google box for the top five "google hacks" results (this book included, hopefully), indexed the day before the time of this writing:

% perl timebox.pl "google hacks" 5 1

<a href="http://isbn.nu/0596004478">Google Hacks</a> <br />

<a href="http://isbn.nu/0596004478/shipsort">Google Hacks</a> <br />

<a href="http://isbn.nu/0596004478/amazonca">Amazon.ca: Google Hacks</a>  <br />

<a href="http://www.oreilly.de/catalog/googlehks/">Google Hacks</a> <br />

<a href="http://www.oreilly.de/catalog/googlehks/author.html">Google Hacks </a> <br />

2.4.5.4 Hacking the hack

Perhaps you'd like your Google box to reflect "this day in 1999." No problem for this slightly tweaked version of the Timely Google box (changes highlighted in bold):

#!/usr/local/bin/perl

# timebox_thisday.pl 

# A Google box for this day in <year> 

# Usage: perl timebox.pl <query> <# results> [year] 

     

# Your Google API developer's key.

my $google_key='insert key here ';

     

# Location of the GoogleSearch WSDL file.

my $google_wdsl = "./GoogleSearch.wsdl";

     

use strict;

     

use SOAP::Lite;

use Time::JulianDay;

     

my @now = localtime(time); 

     

# Bring in those command-line arguments.

@ARGV == 2 

or die "Usage: perl timebox.pl <query> <# results> [year]\n"; 

 

my($query, $maxResults, $year) = @ARGV;



$maxResults = 10 if ($maxResults < 1 or $maxResults > 10);

$year =~ /^\d{4}$/ or $year = 1999; 

     

# Figure out when this day in the specified year is.

my $then = int julian_day($year, $now[4], $now[3]);

     

# Create a new SOAP::Lite instance, feeding it GoogleSearch.wsdl.

my $google_search = SOAP::Lite->service("file:$google_wdsl");

     

# Query Google.

my $results = $google_search -> 

  doGoogleSearch(

    $google_key, "$query daterange:

$then-$then

", 0, 

    $maxResults, "false", "",  "false", "", "latin1", "latin1"

  );

     

# No results?

@{$results->{resultElements}} or die "no results";

     

print join "\n",

  

"$query on this day in $year<p />",



  map( { 

    qq{<a href="$_->{URL}">} . 

    ($_->{title} || $_->{URL}) . 

    qq{</a> <br />} 

  } @{$results->{resultElements}} );

2.4.5.5 The results

The hacked version of timely Google box runs just like the first version, except that you specify the maximum number of results and a year. Going back further than 1999 doesn't yield particularly useful results given that Google came online in 1998.

Let's take a peek at how Netscape was doing in 1999:

% perl timebox_thisday.pl "netscape" 5 1999



netscape

 on this day in 1999:<p />

<a href="http://www.showgate.com/aol.html">WINSOCK.DLL and NETSCAPE Info for 

AOL Members</a> <br />

<a href="http://www.univie.ac.at/comment/99-3/993_23.orig.html">Comment 99/3 

- Netscape Communicator</a> <br />

<a href="http://www.ac-nancy-metz.fr/services/docint/netscape.htm">NETSCAPE.

</a> <br />

<a href="http://www.ac-nancy-metz.fr/services/docint/Messeng1.htm">Le 

Courrier électronique avec Netscape Messenger</a> <br />

<a href="http://www.airnews.net/anews_ns.htm">Setting up Netscape 2.0 for 

Airnews Proxy News</a> <br />

    Previous Section  < Day Day Up >  Next Section