jeffheaton's picture

    A network analyzer is a program that allows TCP/IP traffic between the web server and a web browser to be monitored. With a network analyzer, a typical web browser accessing the desired web server can be monitored. This shows exactly what information is transmitted.

    The network analyzer is useful during all of the bot’s development phases. Initially, the network analyzer can be used to analyze a typical session with the desired web server. This shows the HTTP requests and responses the bot must support. Once the bot is created, the network analyzer is used again during the debugging process and then to verify the final product.

Using a Network Analyzer to Design a Bot

    The first step in designing a bot is to analyze the HTTP requests and responses that flow between the web browser and web server. The bot will need to emulate these requests in order to obtain the desired data from the web server.

    Sometimes this flow of requests and responses are difficult to determine. Just viewing the source of the HTML pages and trying to understand what is going on can be a lengthy task. For sites that use techniques such as AJAX, the requests can become quite complex.

    To analyze HTTP requests and properly design the bot, the network analyzer should be started and begin recording network traffic. This will be discussed later in this chapter. The web browser should then be launched and the web browser started. It is a good idea to clear the browser’s cache at this point. The procedure for clearing the cache varies with each browser; this option is usually located under the Internet configuration. Cached files may cause some information to be hidden from the network analyzer.

    Once the web browser is launched, the desired web site should be accessed. While on the desired web site, use the site as a regular user would. The objective while using the analyzer is to get to the data that the bot should access. Take as direct a path to the desired data as possible. The simpler the path, the easier it will be to emulate. As the web site is navigated, the network analyzer will record the progress. The analyzer will capture every request made by the web browser. In order to access this site, the bot must provide the same requests to the web server.

Using a Network Analyzer to Debug a Bot

    Creating a bot for some sites can be tricky. For example, a site may use complex messages to communicate with the web server. If the bot does not exactly reproduce these requests, it will not function properly. If the bot is not functioning properly, then a network analyzer should be used to debug the bot.

    The technique that I normally use is to run the network analyzer while my bot runs. The network analyzer can track the HTTP requests issued by the bot just as easily as it can track the requests issued by a real user on a web browser.

    If the web server is not communicating properly with the bot, then one of the HTTP requests must be different to what a regular web browser would issue. The packets captured from the bot’s session with the desired web site should then be compared to the packets captured from a regular browser session with the desired web site.

    The next section will show how to use a Network Analyzer. There are many different Network Analyzers available. The one that will be used for this book is WireShark. WireShark is a free open source network analyzer that runs on a wide variety of operating systems.


Copyright 2005 - 2012 by Heaton Research, Inc.. Heaton Research™ and Encog™ are trademarks of Heaton Research. Click here for copyright, license and trademark information.