Author: Marc Nause
Licence: LGPL
Version: 0.6.5 (29.November.2008) (changelog)
Ismael.pm is an easy to use module that allows to gather information about YaCy-peers and to use several functions of YaCy-peers.
If you don't know what YaCy is, you probably don't have any need to use Ismael.pm yet. YaCy is a p2p-based distributed Web Search Engine. You can get more information regarding YaCy at http://yacy.net/.
Download Ismael.pm from http://ismael.audioattack.de/download/ismael-0.6.5.tar.gz.
To use Isamel.pm, simply unpack it and place it in the directory the Perl script that is supposed to use Isamel.pm is located in. Another place to put Isamel.pm is one of the directories in @INC which you can display by running "perl -V".
Ismael.pm requires the modules LWP::UserAgent and HTTP::Request.
Ismael provides this commands:
Command | Parameters | Return Values | Comments | Needs Authentication |
---|---|---|---|---|
addJob | url=>url, depth=>number, intention=>text, filter=>regex, dynamic=>on/off, local=>on/off, remote=>on/off, maxcheck=>on/off, maxpages=>number, autodom=>on/off, autodomdepth=>number, recrawl=>on/off, recrawltime=>number, recrawlunit=>year/month/day/hour/minute, stopword=>on/off | 1 - success 0 - no success | This function adds a new job the the crawling queue of a peer. If username and password have been added to the properties, they don't need to be added to the arguments here. ValcrawlingFilterues added to the arguments override values in the properties. None of the arguments are necessery with the exception of url which is the root of the crawl job, if password and username have been added to the properties before. The url f the peer has to be set before this function can be executed. maxcheck and autodom are automatically set on "on" if values for autodomdepth or maxpages are given (unless they are explicitly set to "off"). | yes |
errstr | - | a textual description of the last error | This function returns a textual description of the last error. The string will be empty if no error occured during the execution of the last function before calling errstr(). The description string gets reset when a new function is called or after errstr() is called. | no |
getData | - | 1 - success 0 - no success | This function refreshes the data which is known about the peer and which can be queried via getProperty and getProperties. The function is executed when the url of the object (=peer) is set. It has to be executed every time a refresh of the data is needed. | no |
getProperties | activecount, activelinks, activewords, passivecount, passivelinks, passivewords, potentialcount, potentiallinks, potentialwords, allcount, alllinks, allwords, authHash, pass, user, url, yourname, yourtype, yourversion, yourutc, youruptime, yourlinks, yourwords, youracceptcrawl, youracceptindex, yoursentwords, yoursenturls, yourreceivedwords, yourreceivesurls, yourppm, yourseeds, yourconnects | an array with the properties | This function can be used to get several properties from an object (= peer). That means that one or more parameters can be used as an argument. The order of the returned array is determined by the order of the parameters. | no |
getProperty | activecount | activelinks | activewords | passivecount | passivelinks | passivewords | potentialcount | potentiallinks | potentialwords | allcount | alllinks | allwords | authHash | pass | user | url | yourname | yourtype | yourversion | yourutc | youruptime | yourlinks | yourwords | youracceptcrawl | youracceptindex | yoursentwords | yoursenturls | yourreceivedwords | yourreceivesurls | yourppm | yourseeds | yourconnects | a scalar with the property | This function can be used to get a single property from an object (= peer). That means that only one of the parameters can be used as an argument. | no |
new | url=>url, user=>username, pass=password, authHash=>HaShVa1uE | a new object | This is the constructor of a new object (=peer). All of the parameters are optional and can be set later. url includes the url and the port (http://server:port). authHash is the base64 encoded value of "username:password" that can be used instead of the username/password combination. | no |
pauseCrawling | - | 1 - success 0 - no success | This function pauses crawling. If url, username and password have been added to the properties, they don't need to be added to the arguments here. Values added to the arguments override values in the properties. | yes |
ping | - | 1 - success 0 - no success | This function pings a peer. It is called automaticly before getting the peer's data when an URL is set in new() or setProperties(). It can speed up your programs if you ping a peer every time before you request data from it (if you get lots of timeouts otherwise), but it also causes more traffic. That's why it is not called every time before any request to a peer gets started. You're free to either use it or not. | no |
resumeCrawling | - | 1 - success 0 - no success | This function resumes crawling. If url, username and password have been added to the properties, they don't need to be added to the arguments here. Values added to the arguments override values in the properties. | yes |
search | search=>text, count=>number, order=>YBR-Date-Quality | YBR-Quality-Date | Date-YBR-Quality | Quality-YBR-Date | Date-Quality-YBR, resource=>global | local, time=> number, urlmaskfilter=>regex, prefermaskfilter=>regex, topwords=>on/off, description=>on/off, indexpages=>on/off | an array with the results | This function starts a web search. It returns an array with 2 dimensions that holds the results. The array contains: array[x][0]: link array[x][1]: title array[x][2]: description (unless explicitly turned off) array[x][3]: date If you chose topwords, the last cell of the array will hold a comma separated list of the topwords: array[y][0]: topwords (in this case y = max(x)+1) | no |
setProperties | url=>url, user=>username, pass=password, authHash=>HaShVa1uE | 1 - success 0 - no success (URL can't be reached) | This function can be used to either add parameters that have not been set before or to change parameters that have been set before. | no |
setTransferProperties | indexDistribute=>on/off, indexDistributeWhileCrawling=>on/off, indexReceive=>on/off, indexReceiveBlockBlacklist=>on/off | 1 - success 0 - no success | This function can be used to turn index transfer properties on or off. Properties that are not set explicitly when the function is called stay as they are. | yes |
stop | - | 1 - success 0 - no success | This function stops a peer. If url, username and password have been added to the properties, they don't need to be added to the arguments here. Values added to the arguments override values in the properties. | yes |
Here is an example that shows how the commands can be used.
#!/usr/bin/perl #Tell the programm to use Ismael. use Ismael; #Here we create a new object. The peer can be reached at http://4o4.dyndns.org:8080 $peer = Ismael->new(url=>'http://4o4.dyndns.org:8080'); #Here we add the peer owner's username and password. $success = $peer->setProperties(user=>'admin',pass=>'xxxxxxxx'); if($success) {print "setProperties: OK\n";} else {print "setProperties: ".$peer->errstr()."\n";} #Here we ask for a single property. $property = $peer->getProperty(yourppm); print "PPM: $property\n"; #Here we ask for several properties. ($yourppm, $yourname) = $peer->getProperties(yourppm,yourname); print "PPM: $yourppm, Peername: $yourname\n"; #Now we refresh the data of the peer. $success = $peer->getData(); if($success) {print "getData: OK\n";} else {print "getData: ".$peer->errstr()."\n";} #Here we add a new job to the peers crawling queue. $success = $peer->addJob(url=>'http://yacy.net',depth=>'2',filter=>'.*yacy.*'); #BTW: The default filter is .* if($success) {print "addJob: OK\n";} else {print "addJob: ".$peer->errstr()."\n";} #Here we add the same job again. An error Message will be displayed. $success = $peer->addJob(url=>'http://yacy.net',depth=>'2',filter=>'.*yacy.*'); #BTW: The default filter is .* if($success) {print "addJob: OK\n";} else {print "addJob: ".$peer->errstr()."\n";} #Here we pause crawling. $success = $peer->pauseCrawling(); if($success) {print "pauseCrawling: OK\n";} else {print "pauseCrawling: ".$peer->errstr()."\n";} #Here we resume crawling. $success = $peer->resumeCrawling(); if($success) {print "resumeCrawling: OK\n";} else {print "resumeCrawling: ".$peer->errstr()."\n";} #Here we start a search @search = $peer->search(search=>'spaghetti',time=>'6',resource=>'global',order=>'YBR-Date-Quality',urlmaskfilter=>'.*',topwords=>'off'); $i = 0; $length = scalar(@search); while($i < $length){ print $search[$i][0]."\n".$search[$i][1]."\n".$search[$i][2]."\n".$search[$i][3]."\n\n"; $i++; } #Here we start a search which also returns topwords @search = $peer->search(search=>'spaghetti',topwords=>'on'); $i = 0; $length = scalar(@search); while($i < $length-1){ print $search[$i][0]."\n".$search[$i][1]."\n".$search[$i][2]."\n".$search[$i][3]."\n\n"; $i++; } # Print topwords... if($search[$i][0]){ @topwords = split /,/, $search[$i][0]; for($j=0;$j < scalar(@topwords);$j++){ print $topwords[$j]."\n"; } } # Now we turn off index distribution $success = $peer->setTransferProperties(indexDistribute=>'off'); if($success) {print "Turned off index distribution!\n";} else {print "Error while turning off index distribution: ".$peer->errstr()."\n";} # ... and now we turn it on again. $success = $peer->setTransferProperties(indexDistribute=>'on'); if($success) {print "Turned on index distribution!\n";} else {print "Error while turning on index distribution: ".$peer->errstr()."\n";} #Let's ping! if($peer->ping()){ print "Peer is online!"; } else{ print "Peer is offline!"; } #Now we stop the peer! $success = $peer->stop(); if($success) {print "stop: OK\n";} else {print "stop: ".$peer->errstr()."\n";}