<?xml version="1.0" encoding="UTF-8"?> <webscript name="test ws"> <gatherdata> <webcommands name="google webnavigator"> <setUp value="http://www.google.com"/> <beginAt value = "/" /> <setFormElement name="q" value= "webnavigator" /> <submit value="btnG"/> <clickLinkWithText value ="webnavigator" /> <result_selectTableStartingWithPrefix value="news" /> <result_setIfNew /> </webcommands> <webcommands name="altavista webnavigator"> ...... </webcommands> </gatherdata> <sendgathereddata email="firstname.lastname@example.org" /> </webscript>
WebCommands are used to interact with the html pages as if we are commanding a web browser, so it is possible to:
- set form elements value
- set options
- click images
- click text
- click button
- submit pages
To ease the creation of template, we use directly the java method exposed by httpunit.
StoreA persistent storage is used to cache the results, and to perform differencing analysis to define updated results. WebNavigator uses hsqldb.
SchedulerTakes care of scheduling repetitive search.
Result FilterThe basic idea is to define if last search reported new results. At present time, there are two groups of function in this area:
- identify the Result Area (the area of the page that presents results, extracted from the surrounding information). Right now it is possible to define the result area as an html table starting with some text, or to define it as the whole page.
- identify the differences from the previous result set. right now, we are exploring xml differencing. This requires, at least, an xml representation of html page.