Stopping Automated Attack Tools
An analysis of web-based application techniques capable of defending against current and future automated attack tools

For an increasing number of organisations, their web-based applications and content delivery platforms represent some of their most prized and publicly visible business assets. Whether they are used to provide interactive customer services, vital client-server operations, or just to act as informational references, these assets are vulnerable to an increasing number of automated attack vectors – largely due to limitations within the core protocols and insecure application development techniques.

As these web-based applications become larger and more sophisticated, the probability of security flaws or vulnerabilities being incorporated into new developments has increased substantially. In fact, most security conscious organisations now realise that their web-based applications are the largest single source of exploitable vulnerabilities.

Over recent years the ability to discover and identify these application flaws has become a critical assessment phase for both professional security agencies and would-be attackers. To increase the speed and reliability of identifying application-level vulnerabilities and potential exploitation vectors, both groups make extensive use of automated scanning tools.

These automated scanning tools are designed to take full advantage of the state-less nature of the HTTP protocol and insecure development techniques by bombarding the hosting server with specially crafted content requests and/or data submissions. Depending upon the nature of the scanning product its purpose may be to create a duplicate of the client-visible content (e.g. content mirroring); search for specific content (i.e. administrative pages, backup files, e-mail addresses for spam); fuzz application variables to elicit server errors and uncover exploitable holes (e.g. SQL injection, cross-site scripting), or even to conduct a brute force discovery of hidden content or customer authentication credentials.

While there are a vast number of defensive strategies designed to help protect a web-based application against actual exploitation, very few of these strategies provide adequate defence against the initial phases of an attack – in particular the high dependency upon automated scanning tools.

By adopting a number of simple design criteria and/or incorporating minor code changes to existing applications, many organisations will find that the current generation of application scanning tools are ineffective in the discovery of probable security flaws; thereby helping reduce the likelihood for future exploitation.

Automated Scanning

Given the size and complexity of modern web-based applications, the use of automated scanners to navigate, record and test for possible vulnerabilities has become a vital stage in confirming an application’s security. Without the use of automated scanning tools, the process of discovering existing security vulnerabilities is an extremely time consuming task and, when done manually, dependant upon the raw skills of the security consultant or attacker.
Therefore, automated scanning tools are a key component in any attacker’s arsenal – particularly if they wish to identify and exploit a vulnerability with the least amount of effort and within the shortest possible timescale.

Developments in Automated Scanning

Just as web-based applications have evolved over the past decade, so too have the automated tools used to scan and uncover potential security vulnerabilities. Whilst the vast majority of these tools and techniques have come from non-commercial and “underground” sources, the quality of the tools is generally very high and they are more than capable of discovering vulnerabilities in most current application developments and/or deployments.

These automated scanning tools have undergone a series of evolutionary steps in order to overcome the security benefits of each advance in web-development technology, and can be divided into a small number of technological groupings or “generations”.

This evolution of automated scanning tools can be quickly condensed into the following “generations”:

  • 1st Generation – The first generation of automated application scanners did no processing or interpretation of the content they attempted to retrieve. These tools would typically use lists of known file locations (e.g. file locations associated with common IIS administration pages, Compaq Insight Manager pages, Apache root paths, etc.) and sequentially request each URL. At the end of the scan, the attacker would have a list of valid file locations that could then be investigated manually. A common example of a 1st generation tool is a CGI Scanner.
  • 2nd Generation – The 2nd generation of automated scanners used a form of application logic to identify URL’s or URL components contained within an HTML-based page (including the raw client-side scripting content) and navigate to any relevant linked pages – repeating this process as they navigate the host content (a process commonly referred to as ‘spidering’ or ‘spydering’). Depending upon the nature of the specific tool, it may just store the content locally (e.g. mirroring), it may inspect the retrieved content for key values (e.g. email addresses, developer comments, form variables, etc.), build up a dictionary of key words that could be used for later brute forcing attacks, or compile a list of other metrics of the application under investigation (e.g. error messages, file sizes, differences between file contents, etc.) for future reference.
  • 2.5 Generation – A slight advance over second generation scanners, this generation of scanners made use of a limited ability to reproduce or mimic the applications presentation layer. This is typically accomplished by the tool memorising a number of default user clicks or data submissions to get to a key area within the application (e.g. logging into the application using valid credentials) and then continuing with standard 1st or 2nd generation tool processes afterwards. Automated scanning tools that utilise this approach are commonly used in the load or performance testing of an application. Also included within this generational grouping are scanning tools that can understand “onclick” events that build simple URL’s.
  • 3rd Generation – 3rd generation scanning tools are capable of correctly interpreting client-side code (whether that be JavaScript, VBscript, Java, or some other ”just in time” interpreted language) as if rendered in a standard browser, and executing in a fashion similar to a real user.
    Whilst there are literally thousands of tools that can be classed as 1st, 2nd or even 2.5 generation, there are currently no reliable 3rd generation scanning tools capable of correctly interpreting client-side code without a great deal of customisation or tuning for the specific web-technology application under investigation.

What is an automated scanner?

As far as web-based applications are concerned, there are a number of methods and security evaluation techniques that can be used to uncover information about an application that has a security context. An automated scanner makes use of one or more discovery techniques to request data and scans each page returned by the web server and attempts to categorise or identify relative information.
Within the security sphere, in the context of an attack, the key functions and discovery techniques that can be automated include the following:

  • Mirroring – The attacker seeks to capture or create a comprehensive copy of the application on a server or storage device of their choosing. This mirrored image of the application content can be used for:
    Theft and repackaging of intellectual property.
    Part of a customer deception crime such as man-in-the-middle attacks, Phishing, or identity theft.
  • Site Scraping or Spidering – The attacker’s goal is to analyse all returned data and uncover useful information within the visible and non-visible sections of the HTML or client-side scripts. Information gleaned in this process can be used for:
    Harvesting of email addresses for spam lists.
    Social engineering attacks based upon personal data (such as names, telephone numbers, email addresses, etc.)
    Ascertaining backend server processes and software versions or revisions.
    Understanding development techniques and possible code bypasses based upon “hidden” comments and notes left behind by the application developer(s).
    Uncovering application details that will influence future phases in the exploitation of the application (e.g. references to “hidden” URL’s, test accounts, interesting content, etc.).
    Mapping the structure of application URLs and content linking/referencing.
  • CGI Scanning – The inclusion of exhaustive lists of content locations, paths and file names to uncover existing application content that could be used in later examinations or for exploitation. Typically, the information being sought includes:
    Likely administrative pages or directories.
    Scripts and controls associated with different web servers and known to be vulnerable to exploitation.
    Default content and sample files.
    Common “hidden” directories or file path locations.
    Shared web services or content not directly referenced by the web-based application.
    File download repository locations.
    Files commonly associated with temporary content or backup versions.
  • Brute Forcing – Using this technique, an attacker attempts to brute force guess an important piece of data (e.g. a password or account number) to gain access to additional areas or functionality within the application. Common techniques make use of:
    Extensive dictionaries.
    Common file or directory path listings.
    Information gathered through site scraping, spidering and CGI scanning.
    Hybrid dictionaries that include the use of common obfuscation techniques such as elite-speak.
    Incremental iteration through all possible character combinations.
  • Fuzzing – Closely related to brute forcing, this process involves examining each form or application submission variable for poor handling of unexpected content. In recent years, many of the most dangerous application security vulnerabilities have been discovered using this technique. Typically each application variable is tested for:
    Buffer overflows,
    Type conversion handling,
    Cross-site scripting,
    SQL injection,
    File and directory path navigation,
    Differences between client-side and server-side validation processes.

Automated Tool Classes

When discussing automated application scanning and security tools, the most common references or classes for breakdown are:

  • Web Spider – any tool that will spider, scrape or mirror content. Search engines can often be included within this grouping.
  • CGI Scanner – any tool that uses a file or path reference list to identify URL’s for future analysis or attack.
  • Brute Forcer – any tool capable of repetitive variable guessing – usually user ID’s or passwords.
  • Fuzzer – typically an added function to a web spider or personal proxy tool which is used to iterate through a list of “dangerous content” in an attempt to elicit an unexpected error from the application. Any unexpected errors would be manually investigated later with the purpose being to extend the “dangerous content” into a viable attack vector.
  • Vulnerability Scanner – most often a complex automated tool that makes use of multiple vulnerability discovery techniques. For instance the vulnerability scanner may choose to use spidering techniques to map the application after which it then inspects the HTML content to discover all data submission variables and then proceeds to submit a range of knowingly bad characters or content to elicit an unexpected response – finally it attempts to classify any discovered vulnerabilities.

Frequently Used Defences

Over the years a number of defences have been experimented with in order to help protect against the use of automated scanning tools. Most of the defensive research and experimentation has been conducted by web sites that have to protect against tools that capture the contents of the web application/site (e.g. downloading of all images from a ‘porn’ site) or brute force guessing customer login credentials.

The most 10 most frequently utilised defences are:

  • Renaming the server hosting software
  • Blocking HEAD requests for content information,
  • Use of the REFERER field to evaluate previous link information,
  • Manipulation of Content-Type to “break” file downloads,
  • Client-side redirects to the real content location,
  • HTTP status codes to hide informational errors,
  • Triggering thresholds and timeouts to prevent repetitive content requests,
  • Single-use links to ensure users stick to a single navigation path,
  • Honeypot links to identify non-human requests,
  • Turing tests to block non-human content requests.

Server Host Renaming

An early method of thwarting 1st generation automated tools exploited their reliance upon the host server version information. Application logic within these early tools made use of a check to see exactly what type of web server they were running against by reading the Server variable within the HTTP headers and then using this information to select the most appropriate list of checks it would then execute.

By changing the Server variable from one server type/description to another (e.g. “Microsoft-IIS/5.0” becomes “Apache/1.3.19 (Unix)”), this could often be enough to deceive the tool and prevent it from discovering vulnerable CGI’s and URL’s.

Blocking of HEAD Requests

There are a number of legitimate methods in which a client browser can request content from a web-based application. The most common, GET and POST, are used to elicit a response from the application server and typically receive HTML-based content. If the client browser does not want to receive the full content – but instead wishes to know whether a link exists or that the content is unchanged for instance – it can issue a HEAD request (with formatting almost identical to a GET request).

Many 1st generation automated scanners choose to use HEAD requests to spider an application or identify vulnerable CGI’s instead of GET requests because less data is transferred and consequently the scanning or enumeration can be conducted at a greater speed.
Defending against automated scanners that rely upon HEAD requests is trivial. Almost all web hosting servers can be configured to not respond to HTTP HEAD requests – and only provide content via an approved list of HTTP options. This type of configuration is quite common; however, there may be ramifications for data throughput (this may increase as any content request must now retrieve the full volume of data instead of just the file/page headers) and the number of dropped connections may also increase (some tools, after identifying that HEAD requests do not work, will use GET requests and forcibly drop the connection once it has received the header data within the GET response).

Use of the REFERER Field

One of the most popular methods of governing access to the web applications content is often through the use of the Referer entity-header field within the client browser’s submitted HTTP header. Ideally, each time a client web browser requests content or submits data, the HTTP header should contain a field indicating the source URL from which the client request was made. The application then uses this information to verify that the users request has come via an approved path – delivering the requested content if the referrer path is appropriate, or stopping the request if the Referer field is incorrect or missing.

For instance, the user is browsing a content page with a URL of containing a link to the page By clicking on the link, the user will make a HTTP request to the server ( containing the following headers:

The application must maintain a list (or use an algorithm) for validating appropriate access paths to the requested content, and will use the Referer information to verify that the user has indeed come from a valid link. It is not uncommon to reduce the amount of checking by restricting the check to verifying that it just contains the same domain name – if not, the client browser is then redirected to the sites main/initial/login page.

Many 1st and 2nd generation automated scanners do not use (or update) the Referer field within the HTTP header of each request. Therefore, by not processing content requests or submissions with missing or inappropriate Referer data, the application can often block these tools.

It is important to note that some browsers may be configured to not submit a Referer field, or they may contain a link or data of the user’s choice as a method of reducing any leakage of personal information. Additionally, if the user follows a link from another site (e.g. a search engine) or their saved favourites, any content restrictions based upon Referer information will also be triggered.

Content-Type Manipulation

Another method of preventing automated tools from downloading vast amounts of site content is through the use of Content-Type entity-header field manipulation.

The Content-Type field is typically used to indicate the media type of the entity-body sent to the recipient or, in the case of a HEAD request, the media type that would have been sent had the request been a GET request. For example, in the following request the Content-Type has been set by the server to be text/html:

Alternatively, MIME Content-Type can be defined within the actual content through META tags using the HTTP-EQUIV attribute. Tags using this form are supposed to have the equivalent effect when specified as an HTTP header, and in some servers may be translated to actual HTTP headers automatically or by a pre-processing tool.

The application server can define a MIME Content-Type for each and every data object, and is normally used to define how the client browser should interpret the data. There are dozens of content types defined and in common usage, with more being defined all the time. Some of the most frequently encountered definitions include:

By altering file extensions and assigning them non-default MIME types through the use of the servers Content-Type response, it is often possible to trick 1st and 2nd generation automated scanning tools into either ignoring application links or misinterpreting the data they receive.

Automated web spiders and vulnerability scanners are tuned to ignore files that do not contain HTML content (e.g. GIF, JPG, PDF, AVI, DOC, etc.) and the majority of existing tools do not analyse MIME information contained within server HTTP headers. Therefore, for example, by renaming .HTML files to .JPG and ensuring that the Content-Type remains “text/html”, a document containing valid HTML content but called “ImGoingHere.jpg” will be correctly rendered as a web page in a browser, but will be ignored by an automated scanner.

HTTP Status Codes

The majority of users are familiar with the common status codes “200 OK”, “302 Redirect” and “404 File Not Found”. The HTTP protocol provides for a multitude of status codes which a web server can select and send to the client browser following a data connection or request. These status codes are divided into the following 5 key groupings:

From an automated tools perspective, the “200 OK” status code is typically interpreted as a valid request was made to the server (e.g. the page exists and the URL is correct), while any other returned status code in the 4xx and 5xx groupings can be used to ascertain whether the request was invalid or triggered a server-side fault. Depending upon the nature of the automated tool, a 5xx status response could be indicative that malicious content insertion may be possible (e.g. SQL injection, unsigned integer denial of service) and is worthy of manual investigation and further attack.

For instance, a CGI scanner will cycle through a list of known files and file paths – rapidly requesting content from the web-based application server. If a “200 OK” is received, the CGI scanner then reports to the attacker that the path or vulnerable page/content exists. If a “404 Not Found” is received, the scanner assumes that the content doesn’t exists and is therefore not vulnerable to that attack vector – and most likely will not report anything back to the attacker.

However, all modern HTTP web servers allow for bespoke error handling and customisation of status code representations. Consequently, a highly successful method of defeating the usefulness of automated scanners is to always present the same status code (i.e. “200 OK”) for every request – regardless of whether the request was legitimate, requested non-existent content, or generate an unknown server error. This means that the automated scanner cannot base its findings on HTTP status codes, and must then use some form of content inspection logic to analyse the actual content of the HTML body instead.

Client-side Redirection

For many automated scanners, the process of identifying a link or embedded URL is done by searching for relevant “HREF=” references within the HTML content. However, there are a number of alternative methods for indicating URL’s within the HTML body of a server response.

A mechanism called “client-side redirection” is commonly used to redirect browsers to the correct content location after requesting invalid, nonexistent or recently moved content. The most common non-scripted method is through the use of the “Refresh” field (note that the “Refresh” field also allows for a wait period before being automatically redirected). Just like the “Content-Type” field, the “Refresh” field can be contained within the HTTP header or used within an HTTP-EQUIV META tag; for example:


To use client-side redirection as a protective measure against automated scanners, the application developer must ensure that each URL for (valuable) content is initially intercepted by a page designed to automatically redirect the client browser to the correct/real content. For additional security, the application server could also enforce a minimum “wait” time before responding to requests for the real content.

The effect on many automated scanning tools is to induce a “200 OK” status code for each request – therefore having many of the benefits described in the earlier section.

Thresholds and Timeouts

In applications where session ID’s are used to maintain the state of a connection (e.g. uniquely track the user or identify the fact that they have already successfully authenticated themselves), it is also common practice to measure two key interaction variables – the time and frequency of each request or data submission.

Normally, by monitoring the elapsed time since the last data submission, an application can “timeout” a session and force the user to re-login if they have not used the application for an extended period (e.g. an e-banking application that automatically logs out the user after 5 minutes of inactivity). However, it is also possible to monitor the time taken between data submissions – thereby identifying whether an automated tool is processing URL’s at a speed that is unattainable or unlikely for a legitimate human user.

In addition, multiple requests for the same application content using the same session ID can also be monitored. This is commonly implemented as part of an authentication process designed to identify brute force guessing attacks (e.g. repeated guesses at the password associated with an email address on a free web-mail application server) – typically tied to account lockout and/or session cancellation. A similar process can be used to identify repeated attempts to access or submit to the same URL (e.g. a particular CGI or page) – as would occur during a fuzzing attack using an automated tool.

Consider the following HTTP POST data submission:

In this example we see one captured POST submission to the application server. The attacker is fuzzing the “Account” field of the “/Toys/IwantToBuy.aspx” page by repeatedly trying different attack strings (e.g. ‘;--<H1> in this instance). We know that it is the same attacker because all previous requests have used the same session ID. To identify the attack, the application server maintains a couple of extra data variables associated with the session ID information in its backend database – in this case “last requested URL” and a numeric counter. Each time the “last requested URL” is the same, the counter is incremented. Once the counter threshold is reached (e.g. 5 repeated requests), the session ID is revoked and any subsequent data submissions using that session ID are then ignored.

The use of thresholds and timeouts within an application can prove to be successful against all generations of automated scanner. However, once an attacker understands the limits of these two mechanisms (i.e. how many times can he request the same page, and how “slow” the requests need to be to pretend to be human) the automated tools can often be configured to not trigger these application responses.

Onetime Links

Related to the application logic utilised in managing URL request and data submissions through the HTTP Referer field, in some cases it is possible to assign a unique “referrer value” to each page the client browser requests. This “referrer value” is then used to manage the location of the user within the application and identify any requests deemed to be out of order.

For instance, consider the online retailers purchasing page /BuyStageOne.aspx?track=1104569 which contains the following URL’s in the page content:

Each URL, including the users current location, identifies a tracker variable (“track=”) with a numeric value (initially “1104569”). If the user clicks on any link, this tracker value will also be submitted to the application server. Now, assuming that the user clicks on the last link to proceed with the purchasing process, he will proceed to the page “BuyStageTwo.aspx”, but will also be issued with a new unique tracker number and, for example, the contents of the new page (e.g. /BuyStageTwo.aspx?track=1104570) may also contain the following URL’s:

Key things to note with this onetime link anti-scanner implementation are:

  • The tracking number changes with each page, and the earlier number is revoked so that it cannot be used again by the user.
  • Tracking numbers are bound to a per-user session ID.
  • Before the application will process any page request or data submission, it must first verify the integrity of the session (i.e. is the session ID real and make sure it hasn’t been revoked) and then verify that the tracking value is correct.
  • Each URL or link, including the link “back” to the previous page (/BuyStageTwo.aspx) has a new tracking number. The default browser “back” and “forward” buttons will not work – therefore this functionality must be provided within the page itself.
  • Any attempt to follow a URL without a tracking number, or use an invalid tracking number, would be handled by the application as either a user error or seen as an attack (automated or otherwise).
  • Although this example uses a sequential increase in tracker numbers, this is not necessary and the values could be random if required (the use of random tokens is recommended).
  • Whenever the user requests application content containing the correct tracker number, the tracking value can only be used once as a new value is assigned with the server response.

This kind of implementation is successful against most 1st and 2nd generation automated scanners. Many Spidering and Mirroring tools parallelise their requests to speed up the discovery/download process and would therefore fail to handle the per-request changing tracking numbers. Fuzzers too would be affected by this location state management system.
Note: whilst the examples above make use of URL’s containing tracking numbers, the use of HTTP POST submissions instead of GET requests are to be recommended. For a full discussion on the best security practices for URL handling, readers are directed to the paper “Host Naming and URL Conventions” also written by the same author.

Honeypot Links

Since many scanning tools will automatically identify URL’s within the HTML body of a page and blindly request linked content, it is possible to include “hidden” links within an applications content that will direct an automated tool to a continually monitored page. Fake or monitored links such as these fall under generic the category of “honeypots”. By embedding these links within the HTML body in such a way that they would never be visibly rendered or “clickable” by a human user, any client request for this “hidden” content is most probably associated with an attack.

For example, the following content extract uses comment fields (i.e. <!-- and --> ) and background colours (i.e. setting the link colour to be the same as the background colour) to “hide” two URL’s that would not normally be followed by a human user, but are typically followed by automated tools.

The web-based application would be designed in such a way that automated responses (e.g. session ID cancellation, automatic logoff, blocking of the attackers IP address, detailed forensics logging, etc.) are initiated should any request be made to access a honeypot link. Against standard automated scanners, the most likely response is to issue a default page (e.g. the home page) for all requests from that IP address or session ID – no matter what the request is – and initiate any background investigative processes.

Graphical & Audio Turing Tests

There are a number of ways in which the application can force the user to interpret onscreen or audio information, and submit a response that could not normally be supplied through an automated process (unless you include brute-force guessing) before proceeding into another section of the application. The most common implementations make use of graphical images containing a key word or value that cannot be discovered using tools (such as OCR), but must be manually entered in to a form field by the user.
For example, the following graphic is copied from the account creation phase of the Microsoft Passport online service. The background squiggles and leaning text is designed to help prevent automated OCR (Optical Character Recognition) packages from evaluating the text “597UTPH7”.

Unfortunately, graphics such as the one above can be very difficult to understand for some people due to its complexity or personal circumstances (e.g. colour blindness, failing sight). Therefore, alternative Turing tests that make use of audio sound bites can be used as an alternative. Microsoft’s Passport registration also allows users to listen to a voice saying the pass phrase which must be entered correctly to set up the account. To make the process more difficult for automated dictation tools, some background noises and hisses may be included with the real pass phrase data. An example of Microsoft Passport support for a voice-based Turing test is shown below.

This kind of user identification testing is typically used at key points within high-volume applications (e.g. popular webmail services, online domain registration queries, etc.) that have, or are likely to, experience attacks or be used for non-authorised activities. Their purpose is to validate that it is a real person using the application – not an automated tool.

In theory, the ability to differentiate between a real person and a tool or computer system can be done through a specific test. These tests are often called Turing tests, and recent work in this area has led to the development of CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart – systems for web-based applications.

Anti-tool Client-side Code

Whilst the techniques examined in the previous section provide various degrees of protection against automated tool attacks, there exists an additional array of defences capable of defending against all but 3rd generation scanners. These additional defences make use of client-side code.

The use of client-side code, from a security perspective, tends to be overlooked – largely due to a poor understanding of the different coding techniques and adverse publicity associated with frequent client-server content validation flaws. Although an attacker can indeed bypass client-side scripting components that validate content or enforce a sequence of events within the client browser fairly easily, it is still possible to use client-side code as a positive security component as long as suitable validation occurs at the server-side.

The Strengths of Client-side Code

As a mechanism for protecting against automated attack tools and scanners, client-side code provides numerous advantages over other protection mechanisms. However the greatest advantage is derived from the fact that current automated tools either cannot execute the code, or are extremely limited in their ability to interpret any embedded code elements.

The trick to using client-side code in a security context lies in ensuring that the client browser really did execute the code (i.e. validating execution) and did not simply ignore or bypass it. This can be achieved by forcing the client browser to submit a unique value that can only be obtained as part of the actual code execution. These code execution values, or “tokens”, are submitted with any data request or submission, and validated by the server-side application prior to the processing of any other client-supplied data. This process can be referred to as “tokenisation”

While the final client-side code implementation may take on many forms, this functionality can be achieved using any modern client-side interpreted language including JavaScript, VBScript, Java, or even Flash. In fact, if so required, even compiled client-side components (e.g. ActiveX) could be used so long as the client-browser is likely to have it installed - although this is not recommended due to probable code flexibility issues. For ease of implementation it is recommended that client-side interpreted languages, which are available by default within modern client browsers, be used.

Client-side Scripting Alternatives

There are a near infinite number of ways to utilise client-side code elements as a protection device against automated attack tools – with each one influenced by factors such as the nature of the web-based application (e.g. e-banking, retailing, informational, etc.), the type of user (e.g. customer, administrator, associate, etc.), or even the personal preferences of the development staff.
However, there are three primary classes of client-side code elements capable of defending against most automated attack tools:

  • Token Appending.
  • Token Calculator.
  • Token Resource Metering.

Token Appending

The simplest of the client-side scripting techniques, token appending makes use of pre-calculated tokens embedded within the HTML body of the server-supplied content which must then be appended to any data submission or request by the client browser.

For instance, in the example below the HTML content contains a dynamically built link that uses JavaScript to populate the missing “token” value. Any tool that inspects the raw HTML is likely to identify the HREF entity but fail to include the necessary token value.

Alternatively, in the second example below, we see a POST submission form version. A JavaScript function, “addtoken()”, exists in the head section of the HTML document which is called from the submission form (“myform”) with the onClick routine. By default, the “token” field is set to “Fail” – meaning that any failure to process the JavaScript correctly will result in a POST submission containing the data “token=Fail” and would be interpreted by the server-side application as a possible attack.

The only valid way of submitting data using this form is by executing the “addtoken()” function after clicking on the submit button. This JavaScript function then modifies the “token” value by replacing the default “Fail” with the real value (“0a37847ea23b984012”) and completes the submission.

The principles governing this tokenisation of a link are very similar to those discussed in section 3.8 “Onetime Links”, and the server-side responses to an identified attack can be the same.

Token Calculator

Using almost identical techniques as the Token Appending class discussed previously, the Token Calculator class extends these principles by adding a dynamic token creation process.

Instead of using static tokens (e.g. token.value="0a37847ea23b984012"), the client-side script functionality is extended to include routines that actually calculate a token from scratch. For example, in the code snippet below, JavaScript is used to combine the “fake” default token with the session cookie and the page name, and then calculate a CRC32 checksum – that then replaces the “fake” token – and submits the form data to the application.

The routines used to calculate the token can take practically any form and may be as complex or as simple as the application developer feels comfortable with. The only limitation is that the server-side application must be able to verify the integrity and correctness of the token with each client browser data request or submission. Some examples of token calculation include:

  • Concatenating several text variables embedded within the HTML document to create a single submission token.
  • Performing a mathematical routine based upon variables embedded within the HTML document to create a numeric submission token.
  • Using HTML document properties (e.g. session cookies, browser-type, referrer field, etc.) or other user supplied form fields (e.g. user name, date of birth, etc.) to create a unique submission token for that page.

Token Resource Metering

Token Resource Metering extends and refines the principles of the Token Calculator and Token Appending strategies by increasing the complexity of the client-side code execution so that the client browser, when calculating the token, incurs a measurable time delay. The calculated token forms an “electronic payment” and can be used to slow down an automated attack. This process of slowing down data requests or submissions is commonly referred to as “Resource Metering”.

The trick to a successful resource metering strategy lies in ensuring that the calculated token is easily calculated and validated at the server-side, but requires measurable effort to calculate at the client-side. For a full explanation of Resource Metering and how it can be used in a security context, readers are directed to the comprehensive paper titled “Anti Brute Force Resource Metering”, also by the same author.


The use of automated tools to identify security weaknesses within web-based applications and to attack vulnerable content is an increasingly common practice. Therefore, it is important that organisations take adequate precautions to defend against the diverse range of tool techniques and increasingly sophisticated automated scanners used by current and future attackers.

The methods described within this whitepaper provide varying degrees of protection against these attack tools. Simple techniques such as changing host service names, blocking of HTTP HEAD requests and the use of non-informative status codes should be considered an absolute minimum for today’s environments. More sophisticated techniques requiring tighter integration with the dynamically generated application content help to provide better protection, but must be factored early on into the application development lifecycle if they are to be effective.

As the attack tools become even more sophisticated and overcome many of the simpler defence techniques, organisations will be forced to consider the use of client-side code techniques. These techniques currently have the ability to stop all of the generic automated attack tools currently available – including most of the more sophisticated commercial vulnerability scanners (this is important since keygen’s and license-bypass patches are in common use).

Comparative Studies

The techniques explained within this whitepaper for stopping automated attack tools all have their own unique strengths and weaknesses. For an organisation seeking to protect their web-based application from future attack, it is important that the appropriate defensive strategy be adopted and that the right anti-automated tool technique is applied. The following tables help to provide a comparative study of the different techniques previously discussed – however, it is important that application designers realise that these comparisons are of a general nature only (due to the phenomenal array of different automated attack tools which are currently available).

Technique vs. Tool Generation and Classification

Technique vs. Implementation and Client Impact

Authorised Vulnerability Scanning and Security Testing

It is important to understand that the use of automated attack tools play an important role in the legitimate identification of security vulnerabilities. Organisations should be mindful to ensure that any anti-tool defences they install can be overcome for authorised security testing. Failure to include such a mechanism is likely to result in extended and difficult security assessment and penetration testing exercises, which is likely to lead to high costs or a less thorough evaluation.

It is recommended that system administrators and application developers provide a mechanism to turn off many of the more sophisticated anti-automated tool defences based upon factors such as IP address, connection interface or even reserved SessionID’s. For most of the defensive techniques discussed in this paper, any of these mechanisms could be used. However, care should be taken to ensure that this “bypass” mechanism is by default switched off, and must be temporarily enabled to allow automated security testing of the application and its hosting environment.

Combining Defence Techniques

It is important organisations design their online web-based applications in such a manner as to take advantage of as many of the anti-tool defensive techniques as possible. The ability to stack and combine compatible techniques will strengthen the application against attack – providing a valuable defence in depth.

Custom Attack Tools

The techniques outlined in this whitepaper provide varying degrees of defence against automated attack tools which are available through most commercial, freeware and underground sources. However, it is important to note that, should an attacker seek to purposefully target an organisation and take the necessary time and effort to fully engage or evaluate an applications defence, it is likely that they will be able to construct a custom automated attack tool capable of bypassing most of them.
Organisations should evaluate the likelihood of a potential attacker crafting a custom tool to overcome the applications anti-tool defences and seek to adopt appropriate strategies for detecting anomalies in application usage (e.g. repetitive data submissions, high volumes of network traffic at odd hours from out of zone IP addresses, repeated use of the same credit card, etc.).

    Copyright 2001-2007 © Gunter Ollmann