PixieWare Software: PixieRobot in Detail
Website Design and Data Extract Solutions

By Dave Johnstone



PixieRobot in Detail

PixieRobot has several automated and unattended functions, which can be run interactively or in a scheduled mode.

  • Processing "Web Pages" - Used to gather data from the web and store on a local machine for subsequent processing. For example storing in a database, or emailing to another system.
  • Processing "Email File Transfers" - Used to send/receive emails and attachments, containing data files, between systems. Attachments could contain such things as database updates, web extracts, or source code. Application systems can drop-off or receive files from the PixierRobot message folders.
  • PixieRobot in combination with PixieWeb can be used to move data in and out of PICK systems. An example script can be found here: PixieRobot PICK extract script.
  • PLUS - We can write extract scripts for you. The scripts are yours to keep so you can run as many times as required. We charge per script, NOT per transaction extracted.
  • Click to view a simple 9 line PixieRobot script.

    This page is a focal point for PixieRobot as a web robot, which is a program that visits web-sites, reads their pages, inspects the HTML, processes instructions or rules written in scripts, extracts data, and saves the data in another more structured format. The robot only visits the pages needed, making the robot processes more efficient and productive as only targeted data is collected, in an off-line and unattended mode. PixieRobot harnesses the full power of Internet Explorer (IE), so that it is impossible for a web server to distinguish between it and a human user.

    Data found on the WWW is an important resource. Once it is published, it is available to many people and organisations, and may be interacted with using a wide range of tools (e.g. a browser). PixieRobot is another tool for utilising Web pages as a source of data. It is used to intelligently and accurately "browse" the HTML content on standard Web pages.

    Web browsing is the process of opening a Web page via the Internet and displaying the data from that Web page. The data is visually clear to any person looking at the Web page, but it is buried within the underlying HTML code which is a mixture of the presentation rules and data. The purpose of PixieRobot is to capture such page contents and to make them available to your "script" program to intelligently decipher the embedded data. PixieRobot also enhances the basic VBSCRIPT language with a library of functions making it easier to identify the structure elements of a Web page. You can read them and also manipulate them to carry on the next step of a conversation with that web site.

    Major Benefits of PixieRobot are:

    • Reducing people costs by using specialised scripts
    • Speeding up the process by automating the task
    • Eliminating errors and improving accuracy of the extracted data

    Distinctive features of PixieRobot are:

    • Automated and unattended
    • Navigation to web pages and drilling down
    • Filling-in of forms and submission of them
    • Capture of pictures and data
    • Output of the picture and data for other uses

    Detailed Features

    • Whatever a browser can do, PixieRobot can do.
    • PixieRobot can process web pages and send extracted results by e-mail.
    • Processing scripts are written in VBScript.
    • PixieRobot identifies and reports on error conditions encountered during script execution.
    • The data and pictures extracted, or the status of the automated browsing activities may be emailed automatically to a nominated recipient.
    • PixieRobot contains a built in timer for scheduling its activities.
    • PixieRobot updates as well as extracts data on a Web page.
    • Two user Interfaces with either .EXE version or else as functions called as .DLL's.
    • PixieRobot displays a mini icon in the "system tray" on the desk-top so you can run it manually, e.g. for testing scripts or script changes.
    • Write the scripts yourself or let us do it. We can help collect and structure the data you need, from the sites you want, when you want. We will store data in a structured format such as CSV, XML, or database.
    • Many other features including:
      - Popup Manager
      - Runs on Win 95/98/ME/NT/2000/XP/Win7
      - Results instantly saved on local PC
      - Save complete web pages
      - Build loops to process lists of any size
      - Extracts tabular and non-tabular data including text or html
      - Extracts pictures
      - Can negotiate frames
      - Testing made easy with Web page viewer
      - Inexpensive to buy
      - 30 day evaluation period
      - Free email support

    Possible Applications

    General Uses

    • Targeted Web screen scraping that processes the HTML of a web page to extract filtered and meaningful data. 
    • Scanning websites for changes in data by comparing to previous values stored externally to the web page.
    • Automated update process to fill out registration or data capture forms.
    • Web page monitoring and testing - Watch your web-site and issue alert if any problem is encountered. Can test online forms of any complexity (e. g. create test orders in an on-line store).
    • Navigates complex web-sites repeatedly without user intervention and can penetrate deep into web-sites to collect the required data.

    More Specific Uses

    • Extracting Financial Information: Researchers, Traders, Money Managers, and Publishers can have access to the best original information on a most timely basis. Markets include: Commodities, Energy, Credit, Exchange Rates. Data can be extracted for use in other commercial applications or databases
    • E-Commerce/Supply Chain Integration: Inventory, Shipping, Product Information, ProcessStatus, can all be monitored. Relevant data, such as changes from previous access can be used in application integrations, and can be performed quickly and inexpensively.
    • Product availability checking - check the product availability on different sites in order to make purchasing (replenishment) decisions.
    • Keep an eye on your competition. pricing, new products and services.
    • Many sites publish sales/usage statistics - these statistics can be harvested in real time.
    • You can schedule an extract, so that every hour (for example) a nominated web page will be probed. Whenever a value changes, it can be written into eg Excel for charting. You can reliably record a change history every hour over many months. Also provide an alert if something (a price) changes drastically in a short period of time.
    • You can traverse a web-site scanning each page to extract keywords, or to check for bad URL links.

    How PixieRobot Works

    PixieRobot runs the specified script and executes a special function (ExecuteWWW) to retrieve the HTML associated with a selected Web page. The HTML is returned as a text string ("s"), which may be searched for the pre-pattern and post-pattern sub-strings surrounding the desired data. The text between the two patterns is the data to be extracted. The extracted data may itself be manipulated (e.g. convert the text to a numeric value), and then written to another file. If default file is an .XLS file then the extracted raw HTML may be fed directly into a spread-sheet and automatically formatted. This .XLS technique may be used to simplify the process of defining delimitered pattern searches (e.g. no need to find and remove <td> tags). 

    Data from a source transaction file may be read and inserted into the data elements on a Web page form. Thus automating a Web page update (submit) process (e.g. submit product sales or purchase orders into another system, or submit a product for an auction).

    The HTML code patterns must be analysed and encapsulated in a PixieRobot script prior to the price data being extracted. For example the following VBSCRIPT scans the raw HTML to find the string "Prices are..." and then searches for the <table> and </table> tags that follow. The data between the tags is extracted for subsequent use (e.g. stored in an MS_Excel table).  (Click here to view full VBSCRIPT and comments)

    Sub Main
    Monitor = True
    Silent = True
    s = ExecuteWWW("http://www.pixieware.com/totprices.htm")
    iPos = Instr(1, s, "Prices are subject", 1)
    iPos = Instr(iPos, s, "<table ", 1)
    s = Mid(s, iPos)
    iPos = Instr(1, s, "</table", 1)
    s = Left(s, iPos+8)
    Call OutputToFile(s, "prices.xls")
    End Sub

    The extracted string (found in variable "s") can be placed in the nominated output file or emailed to a recipient mail-box. An .xls file extension will cause MS_Excel to read the extracted data and HTML and format the data into columns and rows in the same manner as a browser.

    When running interactively, windows appear on the desk top to control PixieRobot's behavior, for example, change the script to be run. PixieRobot indicates when it has started and finished, and it displays the selected URL to verify that it is available for processing, thus providing a simple check that the script is working. 

    "NOTE: If PixieRobot is used to extract (scrape) data from a web-site, we expect all users of PixieRobot to read and comply with that sites existing legal (patent, copyright, trademark) and any other intellectual property law assertions made by the site owners, in respect of their web-site contents usage." This message does not constitute legal advice and is not a substitute for the professional judgment of an attorney should you need assistance.




    Discover Our Products and Services

    Home

    Web-Site Development

    PixieWeb

    PixieRobot

    Outsourcing

    About PixieWare Software

    Downloads

    Designed and built by: Designed and built by PixieWare Software

    Turn your WEB vision into reality!

    Let us
    provide
    a website quote

    Website Build Package
    Creation of website (maximum 5 pages). Price: CDN$350 per website, $100 of fee due as an up-front downpayment, and $250 of fee due on project completion. Package does not include any additional external fees related to the project (e.g.):

    • Domain-name registration fee (annual fee)
    • Website hosting fee (monthly or annual fee)
    • Email-accounts processing fee (if relevant, a monthly or annual fee)
    • Any other fee related to domain name re-location or change of name server

    Turn your WEB vision into reality!