Creating a search script isn't an easy task. And it is also very
dependent on the site structure you are about to query. The following
pages give some ideas on how to proceed.
<?xml version="1.0" encoding="UTF-8"?> <webscript xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:noNamespaceSchemaLocation='webscript.xsd' name='demo shop sites' /> <gatherdata> <webcommands name="kelkoo</span>"> setUp value="http://www.kelkoo.co.uk</span>"/> beginAt value="/"/> <setFormElement name="siteSearchQuery" value="toilet roll"/> <submit/>
To find the name of input field, open the page in an HTML editor (we
use Mozilla) and click on the INPUT field.
Use the group expression (.)* to represent the result area.
2: Define the result area
First of all visually identify the result area. Then, use an HTML editor to find the exact starting and ending string that defines the result area. Use a regular expression to select the whole area.Use the group expression (.)* to represent the result area.
<result_selectRegEx> <![CDATA <div class="mod_std_sub">(.)*<div id="pages" class="pageDiv"> ]]> </result_selectRegEx>
To check the regular expression you typed, use a regex editor (we use the QuickREx eclipse plugin).
Verify that the highlight area is what you expected.
3: Define the result data structure
This step can be tricky.If the results are included in tables, or in rows of a table, use the corresponding
<result_define_data_structure_as_tables/>
This is the case, but to make an example let's say that we want to define results data structure as regular expression.
<result_define_data_structure_as_regex> <![CDATA[ <div class="width">\s* ]]> </result_define_data_structure_as_regex>
This way, we tells to the parsing engine that every result is defined
from the matching <div class="width">\s* to the next matching
<div class="width">\s* (we call this a start/end strategy).
It is also possible to define results as group expression (we call this
group strategy). Note that it is not easy to balance
the tags in HTML using Regex expressions.
The whole script that we have created is:
4: Upload and run the script
Use the script management menu to upload the script, run it and examine both the whole results and the detailed data parsed.The whole script that we have created is:
<?xml version="1.0" encoding="UTF-8"?> <webscript xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:noNamespaceSchemaLocation='webscript.xsd' name='demo shop sites'> <gatherdata> <webcommands name="kelkoo"> setUp value="http://www.kelkoo.co.uk"/> beginAt value="/"/> <setFormElement name="siteSearchQuery" value="toilet roll"/> <submit/> <!-- define result area --> <result_selectRegEx> <![CDATA[ <div class="mod_std_sub">(.)*<div id="pages" class="pageDiv"> ]]> </result_selectRegEx> <!-- define result data structure --> <result_define_data_structure_as_tables/> <result_setIfNew/> </webcommands> </gatherdata> </webscript>