Previous Section  < Day Day Up >  Next Section

Hack 48 Getting Friendly with FOAFBot

figs/expert.gif figs/hack48.gif

Come to grips with the Semantic Web by using FOAFBot to find out information about your friends and strangers alike.

The Semantic Web is the next generation of the Web. Instead of being made up of just web pages, the Semantic Web uses languages that store information in a way that computers can understand it. Using standard languages like RDF (http://www.w3.org/TR/rdf-syntax-grammar), RDFS, and OWL (http://www.w3.org/TR/owl-features), users can create files called ontologies that have classes of things and the properties that apply to them. People can then make instances of classes that anyone has defined on the Semantic Web.

FOAF, the Friend-Of-A-Friend ontology (http://www.foaf-project.org), is one of the more popular ontologies and Semantic Web applications. The ontology defines a class called Person, and the related properties, such as name, email address, web page address, photographic depictions, and, most importantly, whom the person knows. When people create FOAF data about themselves and their friends, they can point to the FOAF files of their friends. Those files will, in turn, give information about the friends and point to the FOAF files of people the friend knows. This branches out to form a large social network. Some common properties of people in FOAF are listed in Table 7-1.

Table 7-1. FOAF properties
accountName

accountServiceHomepage

aimChatID

based_near

currentProject

depiction

dnaChecksum

family_name

firstName

fundedBy

geekcode 

gender

givenname

holdsAccount

homepage

icqChatID

img

interest

jabberID

knows

mbox 

mbox_sha1sum

msnChatID

myersBriggs

name

nick

page

pastProject

phone

plan

Publications

schoolHomepage

surname

title

topic

topic_interest

weblog

workInfoHomepage

workplaceHomepage 

yahooChatID


Many web applications show FOAF data and the resulting networks. Foafnaut (http://www.foafnaut.org) is an SVG-based visualization of the FOAF networks. Foaf-a-matic is a web-based form that automatically creates a FOAF file without requiring the user to learn the Semantic Web languages. Several other applications are linked from the FOAF Project web site (http://foaf-project.org).

Edd Dumbill created the first FOAFBot, which could be queried for personal information about any person in the network, including who they know. More information, including his original Python source code, is available at http://usefulinc.com/foaf/foafbot. This hack will present the steps required to create your own FOAF-aware IRC agent.

7.6.1 Parsing a FOAF File

FOAF files are written in OWL, the Web Ontology Language. Writing a good OWL parser would take a long time, but luckily, many are available for free on the Web. One of the most popular is Jena, developed at HP Labs (http://www.hpl.hp.com/semweb/jena.htm). It is a Java-based parser, available in a single JAR file. The online documentation is excellent, and the API is relatively intuitive. In this section, you will be taken through the steps of loading a FOAF file with Jena, retrieving the relevant information, and storing it in a data structure.

Before you start, there are some Semantic Web basics that are worth knowing. Everything on the Semantic Web—files, classes, properties, and instances—are all identified by their URIs. A URI (Uniform Resource Indicator) is a web address for the concept. URIs generally take the form of the web address of the file, followed by a "#" and the ID of a concept. For example, the URI of a FOAF file may be http://example.com/myFoaf.rdf. If, within that file, you defined an instance of the Person class with the ID "BobSmith," the URI for Bob would be http://example.com/myFoaf.rdf#BobSmith.

A statement on the Semantic Web takes a form called a triple. As you might expect, a triple has three parts: subject, predicate, and object. The subject is the thing being described. The predicate is the property of the subject that is being described, and the object is the value of the property. For example, say there was a property "age." Table 7-2 shows an example of a triple representing Bob Smith, age 21.

Table 7-2. Example of a triple

Subject

Predicate

Object

BobSmith

age

21


Since everything on the Semantic Web is identified by a URI, every property, class, and instance in the triple is actually identified by its URI. The full triple for the example in the table would be:

Subject:   http://example.com/myFoaf.rdf#BobSmith   

Predicate: http://example.com/another.rdf#age  

Object:    21

Here, the object "21" is just a literal value, so it does not get a URI. If you wanted to connect two objects—say Bob Smith and Joe Schmoe—in a triple, there would be three URIs:

Subject:   http://example.com/myFoaf.rdf#BobSmith   

Predicate: http://example.com/another.rdf#knows

Object:    http://example.com/myFoaf.rdf#JoeSchmoe

A general familiarity with this triple and URI structure will make the Jena output easier to understand and work with.

To begin coding, you will need a class to store all of the FOAF information about a person. The class should have all the properties available in FOAF. The value for each property value will be a string; however, a person can have multiple values for any of these fields (e.g., a person can have multiple email addresses). Thus, the class will maintain a Vector of Strings to store the values for each property:

import java.util.*;

import com.hp.hpl.jena.rdf.model.*;



public class Person {

    

    // Store the info in a hash of Vectors.

    public Hashtable foafData = new Hashtable( );



    public Person( ) {

        // For now, we will leave this blank...

    }

}

With the class in hand, you need to parse the FOAF file and add the correct values to an instance of the Person class. To parse a file in Jena, you first create a model and then read the FOAF file into the model. The FOAF filename should be given by its address on the Web:

import java.util.*;

import java.awt.*;

import com.hp.hpl.jena.rdf.model.*;

import java.io.*;



public class Foaf {



    private static Hashtable foafHash = new Hashtable( );

    private static String inputFile = "http://www.cs.umd.edu/~golbeck/foaf.rdf";



    public static void main (String argv[]) {

        Model model = ModelFactory.createDefaultModel( );

        model.read(inputFile);

    }



}

Once the model has parsed the file, you have to retrieve the triples. The Jena web docs are useful in this respect. To make the process easier, the code for iterating through the statements is:

// Get a list of the subjects.

ResIterator it = model.listSubjects( );



while (it.hasNext( )) {

    Resource subject = it.nextResource( );



    // Get all the properties of the current subject.

    StmtIterator statements = subject.listProperties( );



    while (statements.hasNext( )){

        // This statement is a triple (subject, predicate, and object)

        Statement s = statements.nextStatement( );

    }

}

Now that you have access to the triples in the file, storing the FOAF data comes down to a basic series of if statements. Each time a new subject is encountered, you create an instance of the Person class. For each of the properties of the subject, you will check the URI of the predicate and, if it is a FOAF property, add the value to the proper Vector in the Person's Hashtable.

while (it.hasNext( )) {

    Resource subject = it.nextResource( );



    // Create the person that this subject may represent.

    Person p = new Person( );

    boolean isPerson = false;



    // Get all of the properties of the current subject.

    StmtIterator statements = subject.listProperties( );



    while (statements.hasNext( )){

        // This statement is a triple: subject, predicate, and object.

        Statement s = statements.nextStatement( );

    

        // Check to see if this subject is actually a FOAF Person.

        if(s.getPredicate( ).toString( ).equals(

            "http://www.w3.org/1999/02/22-rdf-syntax-ns#type") &&

            s.getObject( ).toString( ).equals("http://xmlns.com/foaf/0.1/Person")) {



            isPerson = true;

        }



        // Now check for each foaf property and add it.

        String base = "http://xmlns.com/foaf/0.1/";

        String key = s.getPredicate( ).toString( );

        if (key.startsWith(base)) {

            Vector v = (Vector) p.foafData.get(key.substring(base.length( )));

            if (v == null) {

                v = new Vector( );

                p.foafData.put(key.substring(base.length( )), v);

            }



            v.add(s.getObject( ).toString( ));

        }

    } // End statement loop.

In the preceding example, the String base is placed within the loop for clarity. Since it is always the same, that line can easily be moved somewhere else in the code to prevent the step of redeclaring the variable on each iteration.

There are two issues to address before adding this Person object, p, to the Hashtable. First, on the Semantic Web, much data can be included in a file. There is no requirement that a FOAF file must contain only FOAF data. A file may contain information about anything. As the file is parsed, it is necessary to confirm that the object you are parsing is actually a FOAF Person. If it turns out that the object is, in fact, a FOAF Person, you must add it to a Hashtable that will store all of the instances of your Person class. If it is not a FOAF Person, you should just throw away the Person object that you created. The following code makes use of the foafHash declared previously:

if (isPerson) {



 if (p.foafData.get("mbox")!=null)

    for (int i = 0; i < ((Vector)p.foafData.get("mbox")).size( ); i++) {

        String mail = (String) ((Vector)p.foafData.get("mbox")).elementAt(i);

        if (foafHash.get(mail) != null && foafHash.get(mail) != p) {

            merge(p, mail);

        }

        // Sometimes, people preface their mail address with mailto:

        // We'll take it off to make the interface nicer.

        if (mail.startsWith("mailto:"))

            mail = mail.substring(7);

        foafHash.put(mail, p);

    }

 if (p.foafData.get("mbox_sha1sum")!=null)

    for (int i = 0; i < ((Vector)p.foafData.get("mbox_sha1sum")).size( ); i++) {

        String mail = (String) 

                ((Vector)p.foafData.get("mbox_sha1sum")).elementAt(i);

        if (foafHash.get(mail) != null && foafHash.get(mail) != p) {

            merge(p, mail);

        }

        foafHash.put(mail, p);

        }

     }

}

Notice that in both loops, before the instance of the Person class is added to the Hashtable, the following logic is required:

        if (foafHash.get(key) != null && foafHash.get(key) != p) {

            merge(p, key);

        }

Because you may have already parsed information about this Person somewhere else in the file and added an instance of the Person class to the Hashtable, there may already be another instance of the class with different information already stored. In this case, you need to merge the data from the two Person objects. The if statement checks to make sure that the stored Person object is different from the current Person object to prevent unnecessarily merging identical objects. The merge function will copy all of the information into one object and then set the two objects equal to each other.

 private static void merge(Person p, String mail) {

        Person q = (Person) foafHash.get(mail);

    

        for (Enumeration e = p.foafData.keys( ) ; e.hasMoreElements( ) ;) {

            String curKey = (String)e.nextElement( );        

            // Go through each element in the names Vector.

            for (int i = 0 ; i < ((Vector)q.foafData.get(curKey)).size( ); i++) {

            

                String curVal = (String)

                        ((Vector)q.foafData.get(curKey)).elementAt(i);

                // Don't add a name to p if it's already there.

                Vector psData = (Vector)p.foafData.get(curKey);

                if (psData == null)

                    psData = new Vector( );

                if (!psData.contains(curVal)){

                    // Add the value from q to p.

                    psData.add(curVal);

                }

            }

        }

    

        q = p;

    }

This code completes the parsing of a single FOAF file. It may seem complicated, but that is the bulk of everything that has to be done to build this IRC bot. The next two steps take advantage of all of this parsing with only a few more lines of code.

7.6.2 Crawling FOAF Files

FOAF is interesting because it creates a social network—many people are interconnected through linked files. The previous code will parse a single file into the Hashtable, but to collect FOAF data, it is necessary to crawl over files that are linked together. This requires only a few additions. First, you can use a Vector to store the URIs of files to parse:

Vector uris = new Vector( );

uris.add(inputFile);



while (uris.size( ) > 0) {

    // Remove the first element in the Vector.

    inputFile = (String) uris.remove(0);



    /*

     * Here, we insert the previous code that parses the file and builds

     * our model. It is omitted from this example for brevity.

     */

}

You will parse each URL as outlined earlier. As you parse, one more if statement will be required to check for "see also" links. These links point to other files. When encountered, these links will be added to the Vector of URIs. The following should be added to the list of if statements that checks for all of the other FOAF properties:

if (s.getPredicate( ).toString( ).equals(

        "http://www.w3.org/2000/01/rdf-schema#seeAlso")) {



    uris.add(s.getObject( ).toString( ));    

}

With these two small changes, the code will now crawl along the semantic links in each file to parse every FOAF file connected to the network!

The FOAF network is huge, and it will take days to crawl through the whole lot. To get your bot up and running quickly, consider skipping the crawl by eliminating this last section of code and instead listing a handful of FOAF files you want included in your bot's database.


7.6.3 Writing the IRC Interface

Finally, once the previous code has been executed, the Hashtable foafHash will contain all of our Person objects with the correct information. That will take place as an initialization step. The last step to complete FOAFBot is to create the IRC bot interface. Since this is Java-based code, it will use the PircBot API [Hack #35] . You can assume that the onMessage method is overridden to accept input from users in a channel. The rest of this step will just show how to handle requests from users in this context.

Our Person class has all of the information from FOAF, but you can decide which properties you want to be queriable through the IRC bot. All of the people our bot knows about are indexed by email address or email sha1sum—the result of applying the SHA1 mathematical function to a mailto: identifier. For this reason, you will require users to ask for information about a person via an email address. The original FOAFBot also maintains a hash keyed by IRC nickname, since that is easier to find on an IRC channel. To support that, you would simply add another Hashtable to the preceding code, and add Person objects to it by looping over the nick Vector, just as with the email addresses. In the email-indexed bot, a sample query might look like this:

foafbot, name of  golbeck@cs.umd.edu

Upon receiving this command, the bot looks up the address in the hash to retrieve the associated Person object and then put together a response with the information stored in the object:

StringTokenizer t = new StringTokenizer(message);

if (t.nextToken( ).toLowerCase( ).equals(

        this.getName( ).toUpperCase( ).toLowerCase( ) + ",")) {

    try {

        String query = t.nextToken( );

        if (query.equals("name")) {

            t.nextToken( );  // Eliminate the "of".

            String email = t.nextToken( );

            Person p = (Person) foafHash.get(email);

            String response = "";



          Vector data = (Vector)p.foafData.get("name");

            if (data!= null && data.size( ) > 0) {

                response = email + " is named ";

                for (int i = 0; i < data.size( ); i++) {

                    response += data.elementAt(i);

    

                    // This formats the response nicely with commas.

                    if (i + 1 < data.size( )) {

                        response += ", ";

                    }

                }

            }

            else {

                response = "I don't know the name of ";

                response += email;

            }

            sendMessage(channel, response);

        }

    ...

This is just one example of creating a response from the Person object. You can decide which features of FOAF to support and how to support them. With that, the FOAFBot is complete. This is not only an interesting hack by itself, but it also lays the groundwork for any other Semantic Web-based hacks. One of those, TrustBot [Hack #49], is next.

7.6.4 Running the Hack

In this hack, the Foaf.java file contains a main method. Since the bot is based on PircBot, you need to change that. By simply renaming the Foaf.java main method to an init method and calling that init method as one of the first steps in the main method of your PircBot-based bot, the FOAF data crawl will be initialized and stored before the bot joins a channel.

With this change, the only step is to compile and run the bot as usual (see [Hack #35] ). When the bot joins a channel, it will process any requests that you wrote code to handle.

A FOAFBot interface is demonstrated in Figure 7-5.

Figure 7-5. Using FOAFBot to find out about a user
figs/irch_0705.gif


Now you can use FOAFBot to find out about all the users in your channel.

Jennifer Golbeck

    Previous Section  < Day Day Up >  Next Section