Previous Section  < Day Day Up >  Next Section

Hack 16 Filter Channel Lists

figs/moderate.gif figs/hack16.gif

Even if you've already found a satisfactory IRC network, you may have missed some interesting channels. Discover them in the output from the LIST command.

One way of finding a relevant channel on a particular IRC network is to ask the network for the list of channels currently in use. Apart from guessing the names of these channels or finding them by word of mouth, you can apply appropriate filters to the list of all the available channels. To acquire such a list, you can use the LIST command, which returns the list of all public channels, together with their topic and number of users.

3.6.1 The Code

You can use the skeleton code from the RSS to IRC hack [Hack #66], again using the Net::IRC Perl module [Hack #33] . For improved performance, you should precompile the regular expressions that get passed from the command line, as you will be matching them over and over many times. Eventually, you can use printf to pretty-print the matching channels with the columns nicely aligned.

Save the following as filterlist.pl:

#!/usr/bin/perl -w

# filterlist.pl - Filter a list of channels based on given criteria.

# MIT licence, (c) Petr Baudis <pasky@ucw.cz>.



use strict;



### Configuration section.

use vars qw ($nick $server $port);

$nick = 'filtelst';

$server = 'irc.freenode.net';

$port = 6667;



### Preamble.

use Net::IRC;



### Arguments munching and data structures setup.

# Arguments.

use vars qw ($chanre $topicre $userlimit);

($chanre, $topicre, $userlimit) = @ARGV;

$chanre ||= ''; $topicre ||= ''; $userlimit ||= 0;



# Precompile the patterns.

$chanre = qr/$chanre/i;

$topicre = qr/$topicre/i;



# List of matched channels, and maximal length of each field for pretty-printing.

use vars qw (@channels $chanlen $userlen);



# This will eventually print out the channels list when it gets called.

sub list_channels {

  my (@channels) = @_;

  foreach my $chan (@channels) {

    my ($channel, $topic, $usercount) = @$chan;

    printf ("\%-${chanlen}s \%${userlen}d \%s\n", $channel, $usercount, $topic);

  }

}



### Connection initialization.

use vars qw ($irc $conn);

$irc = new Net::IRC;

$conn = $irc->newconn (Nick => $nick, Server => $server, Port => $port,

                       Ircname => 'Channels List Filter');



### The event handlers.

# Connect handler - we immediately try to get the channels list.

sub on_connect {

  my ($self, $event) = @_;

  $self->list ( );

}

$conn->add_handler ('welcome', \&on_connect);



# Received one channel item.

sub on_list {

  my ($self, $event) = @_;

  my (undef, $channel, $usercount, $topic) = $event->args;



  # Filter.

  return unless ($channel =~ $chanre);

  return unless ($topic =~ $topicre);

  return unless ($userlimit == 0

                 or ($userlimit < 0 ? $usercount <= -$userlimit

                            : $usercount >= $userlimit));



  # Enqueue for listing.

  push (@channels, [ $channel, $topic, $usercount ]);



  # Update the pretty-printing skids.

  $^W = 0; # Undefined $chanlen.

  $chanlen = length ($channel) if (length ($channel) > $chanlen);

  $userlen = length ($usercount) if (length ($usercount) > $userlen);

  $^W = 1;

}

$conn->add_handler ('list', \&on_list);



# Received the whole channels list.

sub on_listend {

  my ($self, $event) = @_;

  list_channels (@channels);

  exit;

}

$conn->add_handler ('listend', \&on_listend);



# Fire up the IRC loop.

$irc->start;

3.6.2 Running the Hack

The script takes three arguments. The first one is a regular expression that will be used to filter the name of each listed channel (including the channel prefix, such as # or +). This regular expression can be left empty to find all channels. The second argument is another regular expression, which is used to filter the channel topics. The third and final argument is a population limit. If it is a positive number, at least that many users must be in the channel. If this argument is negative, there must be at most that many users in the channel. If the last argument is zero or missing, no user-count checking is performed.

The script returns a list of matching channels, together with a user count and topic for each one, all slickly formatted. Here's an example where you want to find all channels with names that end in a "nonword" character followed by two "word" characters, such as "-cs" or ".cz", common notation for national channels. The channels' topics must also contain "linux" and contain at least three users:

% ./filterlist.pl '\W\w\w$' 'linux' 3

#linux.cz 134 ??? linux | toto neni hotline. this is not a hotline.

#linux.hu  15 nullinux

#linux-kr   4 Linux @ Korea

#linux.pl 208 potrzebuje kogo¶ co programuje w borlandzie /msg linuxer

Don't forget to adjust the configuration section of the script before executing it. Prepare some good activity to perform while the script is running, as it can take quite some time to complete. Also, read the next section if you are running this on a large IRC network and it gets disconnected before it has finished running.

3.6.3 Hacking the Hack

The problem with the LIST command is that it can generate a massive amount of output for large IRC networks. The number of channels in the most popular IRC networks ranges from 50,000 to 200,000, and dealing with messages to and from those channels already takes up a fair amount of bandwidth. It then takes quite some time and a lot of bandwidth to fetch and process the list—quite often, a server may disconnect you if you exceed the size of the output buffer (known as a send queue, or SendQ). Although this problem is not solved by the IRC protocol itself, some IRC server daemons have addressed the problem by not letting anyone execute the LIST command if the resulting list would be too big, or they may trim it as appropriate.

One way of fixing this problem is to resort to ircd-specific features. There are a large number of forks of the original ircd as well as various rewrites. The original ircd is not used very widely, except on the IRCnet IRC network, as its feature set is rather traditional. EFnet mostly uses ircd-hybrid, which is a fork of the original ircd codebase (up to Version 6; Version 7 was a large-scale rewrite), and ircd-ratbox, which is an ircd-hybrid v7 fork. Another original ircd fork is ircu (Universal ircd), which is used on Undernet and Quakenet. If you're not lost yet, another original ircd fork is freenode's dancer-ircd IRC daemon.

If you are using an ircu-based network (such as Quakenet or Undernet), the LIST command comes with an extended syntax. You can chain several comma-separated criteria in its argument. <N and >N will match only channels with fewer than N users or more than N users, respectively. C<N and C>N filters the channels based on the channel age in minutes, while T<N and T>N perform a similar selection based upon topic age. So, to list all channels with three people and a topic set, you would have to send the command:

LIST <4,>2,T>0

IRCnet chose another approach by providing an original ircd-style service called ALIS (Advanced List Service). ALIS provides quite rich means for searching through the channel list—you can search by name (including wildcards), population, mode, and topic. With IRCnet services, you talk through a special SQUERY name_of_service command (SQUERY service HELP usually gathers some useful usage information).

The ALIS command for searching all Linux-related national channels, with a population of at least three, would then look like:

SQUERY ALIS :LIST #*.?? -min 3 -t linux

Hopefully this hack will have given you a good insight into the variety of methods that enable you to find channels of interest. You may never know what you're missing out on until you look.

Petr Baudis

    Previous Section  < Day Day Up >  Next Section