Previous Section  < Day Day Up >  Next Section

Hack 25. Track Result Counts over Time

Query Google for each day of a specified date range, counting the number of results at each time index.

Sometimes the results of a search aren't of as much interest as knowing the number thereof. How popular is a particular keyword? How many times is so-and-so mentioned? How do differing phrases or spellings stack up against each other?

You may also wish to track the popularity of a term over time to watch its ups and downs, spot trends, and notice tipping points. Combining the Google API and daterange: [Hack #16] syntax is just the ticket.

This hack queries Google for each day over a specified date range, counting the number of results for each day. This leads to a list of numbers that you could enter into Excel and chart, for example.

There are a couple of caveats before diving right into the code. First, the average keyword will tend to show more results over time as Google ads more pages to its index. Second, Google doesn't stand behind its date range search; results shouldn't be taken as gospel.

This hack requires the Time::JulianDay (http://search.cpan.org/search?query=Time%3A%3AJulianDay) Perl module.


2.7.1. The Code

Save the following code as a file named goocount.pl:

#!/usr/local/bin/perl

# goocount.pl

# Runs the specified query for every day between the specified

# start and end dates, returning date and count as CSV.

# Usage: goocount.pl query="{query}" start={date} end={date}\n}

# where dates are of the format: yyyy-mm-dd, e.g. 2002-12-31

     

# Your Google API developer's key.

my $google_key='insert key here';

     

# Location of the GoogleSearch WSDL file.

my $google_wdsl = "./GoogleSearch.wsdl";

     

use SOAP::Lite;

use Time::JulianDay;

use CGI qw/:standard/;

     

# For checking date validity.

my $date_regex = '(\d{4})-(\d{1,2})-(\d{1,2})';

     

# Make sure all arguments are passed correctly.

( param('query') and param('start') =~ /^(?:$date_regex)?$/

  and param('end') =~ /^(?:$date_regex)?$/ ) or

  die qq{usage: goocount.pl query="{query}" start={date} end={date}\n};

     

# Julian date manipulation.

my $query = param('query');

my $yesterday_julian = int local_julian_day(time) - 1;

my $start_julian = (param('start') =~ /$date_regex/)

  ? julian_day($1,$2,$3) : $yesterday_julian;

my $end_julian = (param('end') =~ /$date_regex/)

  ? julian_day($1,$2,$3) : $yesterday_julian;

     

# Create a new Google SOAP request.

my $google_search  = SOAP::Lite->service("file:$google_wdsl");

     

print qq{"date","count"\n};

     

# Iterate over each of the Julian dates for your query.

foreach my $julian ($start_julian..$end_julian) {

  $full_query = "$query daterange:$julian-$julian";

  # Query Google

  my $result = $google_search ->

    doGoogleSearch(

      $google_key, $full_query, 0, 10, "false", "",  "false",

      "", "latin1", "latin1"

    );

     

  # Output

  print

    '"',

    sprintf("%04d-%02d-%02d", inverse_julian_day($julian)),

    qq{","$result->{estimatedTotalResultsCount}"\n};

}

Be sure to replace insert key here with your Google API key.

2.7.2. Running the Hack

Run the script from the command line ["How to Run the Hacks" in the Preface], specifying a query, start, and end dates.

Perhaps you'd like to see track mentions of the latest Macintosh operating system (code name "Panther") leading up to, on, and after its launch (October 24, 2003). The following invocation sends its results to a comma-separated (CSV) file for easy import into Excel or a database:

% perl goocount.pl query="OS X Panther" \

start=2003-10-20 end=2003-10-28 > count.csv

Leaving off the > and CSV filename sends the results to the screen for your perusal:

% perl goocount.pl query="OS X Panther" \

start=2003-10-20 end=2003-10-28

If you want to track results over time, you could run the script every day (using cron under Unix or the scheduler under Windows), with no date specified, to get the information for that day's date. Just use >> filename.csv to append to the filename instead of writing over it. Or you could get the results emailed to you for your daily reading pleasure.

2.7.3. The Results

Here's that search for Panther, the new Macintosh operating system:

% perl goocount.pl query="OS X Panther" \

start=2003-10-20 end=2003-10-28

"date","count"

"2003-10-20","28"

"2003-10-21","39"

"2003-10-22","68"

"2003-10-23","48"

"2003-10-24","98"

"2003-10-25","40"

"2003-10-26","56"

"2003-10-27","79"

"2003-10-28","130"

Notice the expected spike in new finds on release day, October 24th.

2.7.4. Working with These Results

If you have a fairly short list, it's easy to just look at the results and see if there are any spikes or particular items of interest about the result counts. But if you have a long list or you want a visual overview of the results, it's easy to use these numbers to create a graph in Excel or your favorite spreadsheet program.

Simply save the results to a file, and then open the file in Excel and use the chart wizard to create a graph. You'll have to do some tweaking but just generating the chart provides an interesting overview, as shown in Figure 2-2.

Figure 2-2. An Excel graph tracking mentions of Mac OS X Panther


2.7.5. Hacking the Hack

You can render the results as a web page by altering the code ever so slightly (changes are in bold) and directing the output to an HTML file (>> filename.html):

...

print

  

header( ),

  

start_html("GooCount: $query"),

  

start_table({-border=>undef}, caption("GooCount:$query")),

  

Tr([ th(['Date', 'Count']) ]);

     

foreach my $julian ($start_julian..$end_julian) {

  $full_query = "$query daterange:$julian-$julian";

  my $result = $google_search ->

    doGoogleSearch(

      $google_key, $full_query, 0, 10, "false", "",  "false",

      "", "latin1", "latin1"

    );

     

  print

    Tr([ td([

      sprintf("%04d-%02d-%02d", inverse_julian_day($julian)),

      $result->{estimatedTotalResultsCount}

    ]) ]);

}

     

print

  end_table( ),

  end_html;   

    Previous Section  < Day Day Up >  Next Section