Previous Section  < Day Day Up >  Next Section

Hack 37. Find the Largest Page

We all know about Feeling Lucky with Google. But how about Feeling Large?

Google sorts your search results by PageRank. Certainly makes sense. Sometimes, however, you may have a substantially different focus in mind and want things ordered in some other manner. Recency is one that comes to mind. Size is another.

In the same manner that Google's "I'm Feeling Lucky" button redirects you to the search result with the highest PageRank, this hack sends you directly to the largest (in kilobytes).

This hack works rather nicely in combination with repetition [Hack #15] .


2.19.1. The Code

Save the following code as a CGI script ["How to Run the Hacks" in the Preface] named goolarge.cgi in your web server's cgi-bin directory. Be sure to replace insert key here with your Google API key.

#!/usr/local/bin/perl

# goolarge.cgi

# A take-off on "I'm Feeling Lucky," redirects the browser to the largest

# (size in K) document found in the first n results.  n is set by number

# of loops x 10 results per.

# goolarge.cgi is called as a CGI with form input

     

# Your Google API developer's key.

my $google_key='insert key here';

     

# Location of the GoogleSearch WSDL file.

my $google_wdsl = "./GoogleSearch.wsdl";

     

# Number of times to loop, retrieving 10 results at a time.

my $loops = 10;

     

use strict;

     

use SOAP::Lite;

use CGI qw/:standard/;

     

# Display the query form.

unless (param('query')) {

  print

    header( ),

    start_html("GooLarge"),

    h1("GooLarge"),

    start_form(-method=>'GET'),

    'Query: ', textfield(-name=>'query'),

    ' &nbsp; ',

    submit(-name=>'submit', -value=>"I'm Feeling Large"),

    end_form( ), p( );

}

     

# Run the query.

else {

  my $google_search  = SOAP::Lite->service("file:$google_wdsl");

  my($largest_size, $largest_url);

     

  for (my $offset = 0; $offset <= $loops*10; $offset += 10) {

     

    my $results = $google_search -> 

      doGoogleSearch(

        $google_key, param('query'), $offset, 

        10, "false", "",  "false", "", "latin1", "latin1"

      );

     

    @{$results->{'resultElements'}} or print p('No results'), last;

     

    # Keep track of the largest size and its associated URL.

    foreach (@{$results->{'resultElements'}}) {

      substr($_->{cachedSize}, 0, -1) > $largest_size and

        ($largest_size, $largest_url) = 

        (substr($_->{cachedSize}, 0, -1), $_->{URL});

    }

  }

     

  # Redirect the browser to the largest result.

  print redirect $largest_url;

}

2.19.2. Running the Hack

Point your web browser at the goolarge.cgi CGI script. Enter a query and click the "I'm Feeling Large" button. You'll be transported directly to the largest page matching your query—within the first specified number of results (the default is 100 results: 10 loops of 10 results apiece), that is.

2.19.3. Usage Examples

Perhaps you're looking for bibliographic information of a famous person. You might find that a regular Google search doesn't net you any more than a mention on a plethora of content-light web pages. Running the same query through this hack sometimes turns up pages with extensive bibliographies.

Maybe you're looking for information about a state. Try queries for the state name along with related information, such as motto, capitol, or state bird.

2.19.4. Hacking the Hack

This hack isn't so much hacked as tweaked. By changing the value assigned to the $loops variable in my $loops = 10;, you can alter the number of results that the script checks before redirecting you to the largest result. Remember, the maximum number of results is the number of loops multiplied by 10 results per loop. The default of 10 considers the top 100 results. A $loops value of 5 would consider only the top 50; 20, the top 200; and so forth.

    Previous Section  < Day Day Up >  Next Section