By Dave Johnstone
PixieRobot has several automated and unattended functions, which can be run interactively or in a scheduled mode.
This page is a focal point for PixieRobot as a web robot, which is a program that visits web-sites, reads their pages, inspects the HTML, processes instructions or rules written in scripts, extracts data, and saves the data in another more structured format. The robot only visits the pages needed, making the robot processes more efficient and productive as only targeted data is collected, in an off-line and unattended mode. PixieRobot harnesses the full power of Internet Explorer (IE), so that it is impossible for a web server to distinguish between it and a human user.
Data found on the WWW is an important resource. Once it is published, it is available to many people and organisations, and may be interacted with using a wide range of tools (e.g. a browser). PixieRobot is another tool for utilising Web pages as a source of data. It is used to intelligently and accurately "browse" the HTML content on standard Web pages.
Web browsing is the process of opening a Web page via the Internet and displaying the data from that Web page. The data is visually clear to any person looking at the Web page, but it is buried within the underlying HTML code which is a mixture of the presentation rules and data. The purpose of PixieRobot is to capture such page contents and to make them available to your "script" program to intelligently decipher the embedded data. PixieRobot also enhances the basic VBSCRIPT language with a library of functions making it easier to identify the structure elements of a Web page. You can read them and also manipulate them to carry on the next step of a conversation with that web site.
PixieRobot runs the specified script and executes a special function (ExecuteWWW) to retrieve the HTML associated with a selected Web page. The HTML is returned as a text string ("s"), which may be searched for the pre-pattern and post-pattern sub-strings surrounding the desired data. The text between the two patterns is the data to be extracted. The extracted data may itself be manipulated (e.g. convert the text to a numeric value), and then written to another file. If default file is an .XLS file then the extracted raw HTML may be fed directly into a spread-sheet and automatically formatted. This .XLS technique may be used to simplify the process of defining delimitered pattern searches (e.g. no need to find and remove <td> tags).
Data from a source transaction file may be read and inserted into the data elements on a Web page form. Thus automating a Web page update (submit) process (e.g. submit product sales or purchase orders into another system, or submit a product for an auction).
The HTML code patterns must be analysed and encapsulated in a PixieRobot script prior to the price data being extracted. For example the following VBSCRIPT scans the raw HTML to find the string "Prices are..." and then searches for the <table> and </table> tags that follow. The data between the tags is extracted for subsequent use (e.g. stored in an MS_Excel table). (Click here to view full VBSCRIPT and comments)
Monitor = True
Silent = True
s = ExecuteWWW("http://www.pixieware.com/totprices.htm")
iPos = Instr(1, s, "Prices are subject", 1)
iPos = Instr(iPos, s, "<table ", 1)
s = Mid(s, iPos)
iPos = Instr(1, s, "</table", 1)
s = Left(s, iPos+8)
Call OutputToFile(s, "prices.xls")
The extracted string (found in variable "s") can be placed in the nominated output file or emailed to a recipient mail-box. An .xls file extension will cause MS_Excel to read the extracted data and HTML and format the data into columns and rows in the same manner as a browser.
When running interactively, windows appear on the desk top to control PixieRobot's behavior, for example, change the script to be run. PixieRobot indicates when it has started and finished, and it displays the selected URL to verify that it is available for processing, thus providing a simple check that the script is working.
"NOTE: If PixieRobot is used to extract (scrape) data from a web-site, we expect all users of PixieRobot to read and comply with that sites existing legal (patent, copyright, trademark) and any other intellectual property law assertions made by the site owners, in respect of their web-site contents usage." This message does not constitute legal advice and is not a substitute for the professional judgment of an attorney should you need assistance.
Website Build Package
Creation of website (maximum 5 pages). Price: CDN$350 per website, $100 of fee due as an up-front downpayment, and $250 of fee due on project completion. Package does not include any additional external fees related to the project (e.g.):
Turn your WEB vision into reality!