Ismael Logo

Ismael.pm: a Perl module to control YaCy-peers

Author: Marc Nause
Licence: LGPL
Version: 0.6.5 (29.November.2008) (changelog)

Beware! This version of Ismael.pm does not work with current releases of YaCy. I am currently working on an updated version. Please contact me if you need a working version and I can provide you with a dev version.

What can Ismael.pm do?

Ismael.pm is an easy to use module that allows to gather information about YaCy-peers and to use several functions of YaCy-peers.

What is YaCy?

If you don't know what YaCy is, you probably don't have any need to use Ismael.pm yet. YaCy is a p2p-based distributed Web Search Engine. You can get more information regarding YaCy at http://yacy.net/.

How to install Isamel.pm

Download Ismael.pm from http://ismael.audioattack.de/download/ismael-0.6.5.tar.gz.

To use Isamel.pm, simply unpack it and place it in the directory the Perl script that is supposed to use Isamel.pm is located in. Another place to put Isamel.pm is one of the directories in @INC which you can display by running "perl -V".

Ismael.pm requires the modules LWP::UserAgent and HTTP::Request.

Ismael.pm's commands

Ismael provides this commands:

CommandParametersReturn ValuesCommentsNeeds Authentication
addJoburl=>url, depth=>number, intention=>text, filter=>regex, dynamic=>on/off, local=>on/off, remote=>on/off, maxcheck=>on/off, maxpages=>number, autodom=>on/off, autodomdepth=>number, recrawl=>on/off, recrawltime=>number, recrawlunit=>year/month/day/hour/minute, stopword=>on/off1 - success
0 - no success
This function adds a new job the the crawling queue of a peer. If username and password have been added to the properties, they don't need to be added to the arguments here. ValcrawlingFilterues added to the arguments override values in the properties. None of the arguments are necessery with the exception of url which is the root of the crawl job, if password and username have been added to the properties before. The url f the peer has to be set before this function can be executed. maxcheck and autodom are automatically set on "on" if values for autodomdepth or maxpages are given (unless they are explicitly set to "off").yes
errstr-a textual description of the last errorThis function returns a textual description of the last error. The string will be empty if no error occured during the execution of the last function before calling errstr(). The description string gets reset when a new function is called or after errstr() is called.no
getData-1 - success
0 - no success
This function refreshes the data which is known about the peer and which can be queried via getProperty and getProperties. The function is executed when the url of the object (=peer) is set. It has to be executed every time a refresh of the data is needed.no
getPropertiesactivecount, activelinks, activewords, passivecount, passivelinks, passivewords, potentialcount, potentiallinks, potentialwords, allcount, alllinks, allwords, authHash, pass, user, url, yourname, yourtype, yourversion, yourutc, youruptime, yourlinks, yourwords, youracceptcrawl, youracceptindex, yoursentwords, yoursenturls, yourreceivedwords, yourreceivesurls, yourppm, yourseeds, yourconnectsan array with the propertiesThis function can be used to get several properties from an object (= peer). That means that one or more parameters can be used as an argument. The order of the returned array is determined by the order of the parameters.no
getPropertyactivecount | activelinks | activewords | passivecount | passivelinks | passivewords | potentialcount | potentiallinks | potentialwords | allcount | alllinks | allwords | authHash | pass | user | url | yourname | yourtype | yourversion | yourutc | youruptime | yourlinks | yourwords | youracceptcrawl | youracceptindex | yoursentwords | yoursenturls | yourreceivedwords | yourreceivesurls | yourppm | yourseeds | yourconnectsa scalar with the propertyThis function can be used to get a single property from an object (= peer). That means that only one of the parameters can be used as an argument.no
newurl=>url, user=>username, pass=password, authHash=>HaShVa1uEa new objectThis is the constructor of a new object (=peer). All of the parameters are optional and can be set later. url includes the url and the port (http://server:port). authHash is the base64 encoded value of "username:password" that can be used instead of the username/password combination.no
pauseCrawling-1 - success
0 - no success
This function pauses crawling. If url, username and password have been added to the properties, they don't need to be added to the arguments here. Values added to the arguments override values in the properties.yes
ping-1 - success
0 - no success
This function pings a peer. It is called automaticly before getting the peer's data when an URL is set in new() or setProperties(). It can speed up your programs if you ping a peer every time before you request data from it (if you get lots of timeouts otherwise), but it also causes more traffic. That's why it is not called every time before any request to a peer gets started. You're free to either use it or not.no
resumeCrawling-1 - success
0 - no success
This function resumes crawling. If url, username and password have been added to the properties, they don't need to be added to the arguments here. Values added to the arguments override values in the properties.yes
searchsearch=>text, count=>number, order=>YBR-Date-Quality | YBR-Quality-Date | Date-YBR-Quality | Quality-YBR-Date | Date-Quality-YBR, resource=>global | local, time=> number, urlmaskfilter=>regex, prefermaskfilter=>regex, topwords=>on/off, description=>on/off, indexpages=>on/offan array with the resultsThis function starts a web search. It returns an array with 2 dimensions that holds the results. The array contains:

array[x][0]: link
array[x][1]: title
array[x][2]: description (unless explicitly turned off)
array[x][3]: date

If you chose topwords, the last cell of the array will hold a comma separated list of the topwords:

array[y][0]: topwords (in this case y = max(x)+1)
no
setPropertiesurl=>url, user=>username, pass=password, authHash=>HaShVa1uE1 - success
0 - no success (URL can't be reached)
This function can be used to either add parameters that have not been set before or to change parameters that have been set before.no
setTransferPropertiesindexDistribute=>on/off, indexDistributeWhileCrawling=>on/off, indexReceive=>on/off, indexReceiveBlockBlacklist=>on/off1 - success
0 - no success
This function can be used to turn index transfer properties on or off. Properties that are not set explicitly when the function is called stay as they are.yes
stop-1 - success
0 - no success
This function stops a peer. If url, username and password have been added to the properties, they don't need to be added to the arguments here. Values added to the arguments override values in the properties.yes

Example

Here is an example that shows how the commands can be used.

#!/usr/bin/perl

#Tell the programm to use Ismael.
use Ismael;

#Here we create a new object. The peer can be reached at http://4o4.dyndns.org:8080
$peer = Ismael->new(url=>'http://4o4.dyndns.org:8080');

#Here we add the peer owner's username and password.
$success = $peer->setProperties(user=>'admin',pass=>'xxxxxxxx');
if($success) {print "setProperties: OK\n";}
else {print "setProperties: ".$peer->errstr()."\n";}

#Here we ask for a single property.
$property = $peer->getProperty(yourppm);
print "PPM: $property\n";

#Here we ask for several properties.
($yourppm, $yourname) = $peer->getProperties(yourppm,yourname);
print "PPM: $yourppm, Peername: $yourname\n";

#Now we refresh the data of the peer.
$success = $peer->getData();
if($success) {print "getData: OK\n";}
else {print "getData: ".$peer->errstr()."\n";}

#Here we add a new job to the peers crawling queue.
$success = $peer->addJob(url=>'http://yacy.net',depth=>'2',filter=>'.*yacy.*');
#BTW: The default filter is .*
if($success) {print "addJob: OK\n";}
else {print "addJob: ".$peer->errstr()."\n";}

#Here we add the same job again. An error Message will be displayed.
$success = $peer->addJob(url=>'http://yacy.net',depth=>'2',filter=>'.*yacy.*');
#BTW: The default filter is .*
if($success) {print "addJob: OK\n";}
else {print "addJob: ".$peer->errstr()."\n";}

#Here we pause crawling.
$success = $peer->pauseCrawling();
if($success) {print "pauseCrawling: OK\n";}
else {print "pauseCrawling: ".$peer->errstr()."\n";}

#Here we resume crawling.
$success = $peer->resumeCrawling();
if($success) {print "resumeCrawling: OK\n";}
else {print "resumeCrawling: ".$peer->errstr()."\n";}

#Here we start a search
@search = $peer->search(search=>'spaghetti',time=>'6',resource=>'global',order=>'YBR-Date-Quality',urlmaskfilter=>'.*',topwords=>'off');
$i = 0;
$length = scalar(@search);
while($i < $length){
    print $search[$i][0]."\n".$search[$i][1]."\n".$search[$i][2]."\n".$search[$i][3]."\n\n";
    $i++;
}

#Here we start a search which also returns topwords
@search = $peer->search(search=>'spaghetti',topwords=>'on');
$i = 0;
$length = scalar(@search);
while($i < $length-1){
    print $search[$i][0]."\n".$search[$i][1]."\n".$search[$i][2]."\n".$search[$i][3]."\n\n";
    $i++;
}
# Print topwords...
if($search[$i][0]){
    @topwords = split /,/, $search[$i][0];
    for($j=0;$j < scalar(@topwords);$j++){
    print $topwords[$j]."\n";
    }
}

# Now we turn off index distribution
$success = $peer->setTransferProperties(indexDistribute=>'off');
if($success) {print "Turned off index distribution!\n";}
else {print "Error while turning off index distribution: ".$peer->errstr()."\n";}

# ... and now we turn it on again.
$success = $peer->setTransferProperties(indexDistribute=>'on');
if($success) {print "Turned on index distribution!\n";}
else {print "Error while turning on index distribution: ".$peer->errstr()."\n";}

#Let's ping!
if($peer->ping()){
    print "Peer is online!";
}
else{
    print "Peer is offline!";
}

#Now we stop the peer!
$success = $peer->stop();
if($success) {print "stop: OK\n";}
else {print "stop: ".$peer->errstr()."\n";}