HACKING EXPOSED WEB APPLICATIONS, 3rd Edition

How are Web Apps Attacked?

TamperIE is a Browser Helper Object (BHO) from Bayden Systems. It is really simple—its only two options are to tamper with GETs and/or POSTs.

TamperIE is a very useful tool, perhaps the only one you really need for manual web app hacking. Its GET tampering feature bypasses any restrictions imposed by the browser, and the PUT feature allows you to tamper with data in the body of the HTTP request that is not accessible from the browser’s address bar.

Firefox Extensions

Here are Firefox extensions for HTTP analysis and tampering, listed in order of our preference, with the most recommended first.

LiveHTTPHeaders. This Firefox plug-in, by Daniel Savard and Nikolas Coukouma, dumps raw HTTP and HTTPS traffic into a separate sidebar within the browser interface.
TamperData. This Firefox extension written by Adam Judson allows you to trace and modify HTTP and HTTPS requests, including headers and POST parameters. It can be loaded as a sidebar or as a separate window.
Modify Headers. Another Firefox extension for modifying HTTP/S requests is Modify Headers by Gareth Hunt. Modify Headers is better for persistent modification than it is for per-request manipulation.

HTTP Proxies

HTTP proxies are stand-alone programs that intercept HTTP/S communications and enable the user to analyze or tamper with the data before submitting it. They do this by running a local HTTP service and redirecting the local web client there (usually by setting the client’s proxy configuration to a high local TCP port like 8888). The local HTTP service, or proxy, acts as a “man-in-the-middle” and permits analysis and tampering with any HTTP sessions that pass through it.

Currently, available HTTP proxy tools include:

OWASP WebScarab. This tool includes an HTTP proxy, crawler/spider, session ID analysis, script interface for automation, fuzzer, encoder/decoder utility for all of the popular web formats (Base64, MD5, and so on), and a Web Services Description Language (WSDL) and SOAP parser, to name a few of its more useful modules. It is licensed under the GNU General Public License v2. Like Paros, WebScarab is written in Java and thus requires the JRE to be installed.
Fiddler. This handy tool is a free release from Eric Lawrence and Microsoft, and it’s the best non-Java freeware HTTP proxy we’ve seen.

Command-line Tools

Here are a couple of command-line tools that are good to have around for scripting and iterative attacks.

cURL

cURL is a free, multiplatform command-line tool for manipulating HTTP and HTTPS. It’s particularly powerful when scripted to perform iterative analyses.

Netcat

The “Swiss Army Knife” of network hacking, netcat is elegant for many tasks. As you might guess from its name, it most closely resembles the Unix cat utility for outputting file content. The critical difference is that netcat performs the same function for network connections: it dumps the raw input and output of network communications to the command line. Note: because it is simply a raw network tool, netcat requires a lot of manual effort when used for web application work.

Application Profiling

The purpose of surveying the application is to generate a complete picture of the content, components, function, and flow of the web site in order to gather clues about where underlying vulnerabilities might be.

An automated vulnerability checker typically searches for known vulnerable URLs, the goal of an extensive application survey is to see how each of the pieces fits together.

Manual Inspection

The first thing we usually do to profile an application is a simple click-through. Become familiar with the site, look for all the menus, and watch the directory names in the URL change as you navigate.

Web applications are complex. They may contain a dozen files, or they may contain a dozen well-populated directories.

How to Document an Application

Opening a text editor or spreadsheet program. We suggest documenting things such as:

Page name Listing files in alphabetical order makes tracking down information about a specific page easier.
Full path to the page This is the directory structure leading up to the page.
Does the page require authentication? Yes or no.
Does the page require SSL? The URI for a page may be HTTPS, but that does not necessarily mean the page cannot be accessed over normal HTTP. Put the DELETE key to work and remove the “S”!
GET/POST arguments Record the arguments that are passed to the page.
Comments Make personal notes about the page. Was it a search function, an admin function, or a Help page? Does the page “feel” insecure? Does it contain privacy information?
Some other information you should consider recording in your matrix/flowchart includes the following:
- Statically and dynamically generated pages
- Directory structure
- Common file extensions
- Common files
- Helper files
- Java classes and applets
- HTML source code
- Forms
- Query strings and parameters
- Common cookies
- Backend access points

Statically and Dynamically Generated Pages

Static pages are the generic .html files usually relegated to FAQs and contact information. They may lack the functionality to attack with input validation tests, but the HTML source may contain comments or information. At the very least, contact information reveals e-mail addresses and usernames. Dynamically generated pages (.asp, .jsp, .php, etc.) are more interesting.

Directory Structure

The structure of a web application will usually provide a unique signature. Examining things as seemingly trivial as directory structure, file extensions, naming conventions used for parameter names or values, and so on, can reveal clues that will immediately identify what application is running.

The web server may have directories for administrators, old versions of the site, backup directories, data directories, or other directories that are not referenced in any HTML code. Try to guess the mindset of the administrators and site developers.

For example, if static content is in the /html directory and dynamic content is in the /jsp directory, then any cgi scripts may be in the /cgi directory.

Other common directories to check include these:

Directories that have supposedly been secured, either through SSL, authentication, or obscurity: /admin/ /secure/ /adm/
Directories that contain backup files or log files: /.bak/ /backup/ /back/ / log/ /logs/ /archive/ /old/
Personal Apache directories: /~root/ /~bob/ /~cthulhu/
Directories for include files: /include/ /inc/ /js/ /global/ /local/
Directories used for internationalization: /de/ /en/ /1033/ /fr/

Common Files

Most software installations will come with a number of well-known files, for instance:

Readme
ToDo
Changes
Install.txt
EULA.txt

By searching every folder and subfolder in a site, you might just hit on plenty of useful information that will tell you what applications and versions are running and a nice URL that will lead you to a download page for software and updates.

Helper Files

Helper file is a catch-all appellation for any file that supports the application but usually does not appear in the URL.

Cascading Style Sheets CSS files (.css) instruct the browser on how to format text. They rarely contain sensitive information, but enumerate them anyway.
XML Style Sheets Applications are turning to XML for data presentation. Style sheets (.xsl) define the document structure for XML requests and formatting. They tend to have a wealth of information, often listing database fields or referring to other helper files.
JavaScript Files Nearly every web application uses JavaScript (.js). Much of it is embedded in the actual HTML file, but individual files also exist. Applications use JavaScript files for everything from browser customization to session handling. In addition to enumerating these files, it is important to note what types of functions the file contains.
Include Files On IIS systems, include files (.inc) often control database access or contain variables used internally by the application. Programmers love to place database connection strings in this file—password and all!
The “Others” References to ASP, PHP, Perl, text, and other files might be in the HTML source.

HTML Source Code

HTML source code can contain numerous juicy tidbits of information.

The most obvious place attackers look is in HTML comments, special sections of source code where the authors often place informal remarks that can be quite revealing. The <-- characters mark all basic HTML comments.

Filename-like comments You will typically see plenty of comments with template filenames tucked in them. Download them and review the template code. You never know what you might find.
Old code Look for links that might be commented out. They could point to an old portion of the web site that could contain security holes. Or maybe the link points to a file that once worked, but now, when you attempt to access it, a very revealing error message is displayed.
Auto-generated comments A lot of comments that you might see are automatically generated by web content software. Take the comment to a search engine and see what other sites turn up those same comments. Hopefully, you’ll discover what software generated the comments and learn useful information.
The obvious We’ve seen things like entire SQL statements, database passwords, and actual notes left for other developers in files such as IRC chat logs within comments.

Don’t stop at comment separators. HTML source has all kinds of hidden treasures. Try searching for a few of these strings:

SQL, Select, Insert, #include, #exec, Password, Catbase, Connect, //

Another interesting thing to search for in HTML are tags that denote server-side execution, such as <? and ?> for PHP, and <% and %> and <runat=server> for ASP pages.

The tiniest amount of information in web assessments can bring the biggest breakthroughs. So don’t let anything slide by you, no matter how insignificant it may seem at first.

Forms

Forms are the backbone of any web application.

When manually inspecting an application, note every page with an input field. You can find most of the forms by a click-through of the site. However, visual confirmation is not enough - check the source.

Tricky programmers might not use the password input type or have the words “password” or “passwd” or “pwd” in the form. You can search for a different string, although its hit rate might be lower. So when inspecting a page’s form, make notes about all of its aspects:

Method Does it use GET or POST to submit data? GET requests are easier to manipulate on the URL.
Action What script does the form call? What scripting language was used (.pl, .sh, .asp)? If you ever see a form call a script with a .sh extension (shell script), mark it. Shell scripts are notoriously insecure on web servers.
Maxlength Are input restrictions applied to the input field? Length restrictions are trivial to bypass.
Hidden Was the field supposed to be hidden from the user? What is the value of the hidden field? These fields are trivial to modify.
Autocomplete Is the autocomplete tag applied? Why? Does the input field ask for sensitive information?
Password Is it a password field? What is the corresponding login field?

Query Strings and Parameters

Perhaps the most important part of a given URL is the query string, the part following the question mark that indicates some sort of arguments or parameters being fed to a dynamic executable or library within the application.

You can manipulate parameter values to attempt to impersonate other users, obtain restricted data, run arbitrary system commands, or execute other actions not intended by the application developers. Parameter names may also provide information about the internal workings of the application. They may represent database column names, be obvious session IDs, or contain the username. The application manages these strings, although it may not validate them properly.

Depending on the application or how the application is tailored, parameters have a recognizable look and implementation that you should be watching for.

Collecting query strings and parameters is a complicated task that is rarely the same between two applications. As you collect the variable names and values, watch for certain trends.

Here are some other common query string/parameter “themes” that might indicate potentially vulnerable application logic:

• User identification Look for values that represent the user. This could be a username, a number, the user’s social security number, or another value that appears to be tied to the user. This information is used for impersonation attacks. Relevant strings are userid, username, user, usr, name, id, uid.

Session identification Look for values that remain constant for an entire session. Cookies also perform session handling. Some applications may pass session information on the URL. Relevant strings are sessionid, session, sid, and s.
Database queries Inspect the URL for any values that appear to be passed into a database. Common values are name, address information, preferences, or other user input. These are perfect candidates for input validation and SQL injection attacks.
Look for encoded/encrypted values Don’t be intimidated by a complex-looking value string in a parameter.
Boolean arguments These are easy to tamper with since the universe of possible values is typically quite small. For example, with Boolean arguments such as “debug,” attackers might try setting their values to TRUE, T, or 1. Other Boolean parameters include dbg, admin, source, and show.

Robots.txt

The robots.txt file contains a list of directories that search engines such as Google are supposed to index or ignore. The file might even be on Google, or you can retrieve it from the site itself.

The point is that a robots.txt file provides an excellent snapshot of the directory structure—and maybe even some clear pointers toward misconfigurations that can be exploited later.

Automated Web Crawling

One of the most fundamental and powerful techniques used in profiling is the mirroring of the entire application to a local copy that can be scrutinized slowly and carefully. We call this process web crawling, and web crawling tools are an absolute necessity when it comes to large-scale web security assessments. Your web crawling results will create your knowledge baseline for your attacks, and this baseline is the most important aspect of any web application assessment.

The information you glean will help you to identify the overall architecture of your target, including all of the important details of how the web application is structured, input points, directory structures, and so on. Some other key positives of web crawling include the following:

Spares tons of manual labor!
Provides an easily browseable, locally cached copy of all web application components, including static pages, executables, forms, and so on.
Enables easy global keyword searches on the mirrored content (think “password” and other tantalizing search terms).
Provides a high-level snapshot that can easily reveal things such as naming conventions used for directories, files, and parameters.

Web crawling doesn’t do very well:

Forms Crawlers, being automated things, often don’t deal well with filling in web forms designed for human interaction.
Complex flows But some sites with unorthodox layouts may defy simple interpretation by a crawler and require that a human manually clicks through the site.
Client-side code This problem with client-side code is usually found in free and cheap web crawlers.
State problems We suggest that you profile the authenticated portions of the website manually or look to a web security assessment product when your target site requires that you maintain state. No freeware crawler will do an adequate job for you.
Broken HTML/HTTP
Web services

General Counter Measures

After seeing what information is commonly leaked by web applications, you may be tempted to excise a great deal of content and functionality from your site.

Most information leakage can be stopped at the server level through strong configurations and least-privilege access policies. Keep in mind that web applications are designed to provide information to users. Just because a user can download the application’s local.js file doesn’t mean the application has a poor design; however, if the local.js file contains the username and password to the application’s database, then the system is going to be broken.

Protecting Directories

As we saw many times throughout this chapter, directories are the first line of defence against prying profilers. Here are some tips for keeping them sealed.

Location Headers

You can limit the contents of the Location header in the redirect so it doesn’t display the web server IP address, which can point attackers toward discrete servers with misconfigurations or vulnerabilities.

Protecting include Files

The best protection for all types of include files is to ensure that they do not contain passwords. This might sound trivial, but anytime a password is placed in a file in clear text, expect that password to be compromised.

Miscellaneous Tips

The following tips will help your web application resist the surveying techniques we’ve described in this chapter:

Consolidate all JavaScript files to a single directory.
Strip developer comments. A test environment should exist that is not Internet-facing where developer comments can remain in the code for debugging purposes.
If a file must call any other file on the web server, then use path names relative to the web root or the current directory. Do not use full path names that include drive letters or directories outside of the web document root.
If the site requires authentication, ensure authentication is applied to the entire directory and its subdirectories.

Best Practices

Implement Aggressive Network Access Control—in Both Directions!

TCP port 80 (and optionally 443 if you implement SSL/TLS) are the only ports that you should make available to general audiences in the inbound direction.

Although inbound filtering is broadly appreciated, one common mistake is to ignore outbound access control. One of the first things attackers will seek to do once they’ve gained the ability to run arbitrary commands on a web server is to “shovel” an outbound shell, or make an outbound connection to upload more files to the victim.

The simplest rule is to deny all outbound connections except those that are established, which can be implemented by blocking all packets bearing only a TCP SYN flag.

It’s important to note that sophisticated attackers may be able to hijack legitimate outbound connectivity to bypass outbound filtering. However, in our experience, this is difficult to achieve in practice, and establishing rigorous outbound access control remains one of the most important defensive layers you can implement for your web servers.

Keep Up with Security Patches

The most effective way to maintain a strong and secure web platform is to keep the system up-to-date with security patches. There’s no shortcut: you must continuously patch your platforms and applications.

Don’t Put Private Data in Source Code

Some of the most common failures include these:

Cleartext SQL connect strings in ASP scripts Use SQL integrated security or a binary COM object instead.
Using cleartext passwords in application configuration files Always avoid cleartext passwords in application configuration files.
Using include files with the .inc extension Rename include files to .asp, .php, or the appropriate extension for your web application platform.
Comments within scripts that contain private information like e-mail addresses, directory structure information, and passwords.

Regularly Scan Your Network for Vulnerable Servers

The best mechanism for preventing such compromises is to regularly scan for the vulnerabilities that make those compromises possible. A number of very useful web application assessment products are available, such as HP WebInspect and Watchfire AppScan.

Apache Hardening

Apache comes fairly secure right out of the box, and the Apache group does a good job at fixing most security problems quickly.

Disable Unneeded Modules

One of the most important things to consider when installing Apache is what types of functionality the webserver needs to have. For instance, are PHP scripts or Perl scripts going to be run? Will Server Side Includes be used in the application running on the webserver? Once you can create a list of needed functionality, you can enable the appropriate modules.

Implement ModSecurity

ModSecurity is an Apache module written by Ivan Ristic that works as a web application firewall. It has a huge amount of flexibility and is considered one of the best projects available in terms of helping to secure Apache against application and web platform attacks. Some of ModSecurity’s features are listed here:

• Request filtering

• Anti-evasion techniques

• HTTP filtering rules

• Full audit logging

• HTTPS intercepting

• Chroot functionality

• Mask web server identity

Chrooting Apache

One of the standard rules in security is to practice defense in depth. When attackers break into a web server, one of the first things the attackers will do is attempt to access files on the system such as /etc/passwd, or escalate their privileges via a local exploit. In order to prevent this type of attack, a method of putting the Apache server in a contained environment, or “jail” of sorts, has been created, and it is called chrooting. By implementing this, Apache runs with limited privileges inside of its own contained file system. If attackers were to gain access to the file system, they would be stuck inside this jail environment with no access to the real file system. There are two methods to chrooting Apache that we’ll review here.

External Chrooting

This type of chrooting starts out with a file system that contains nothing but the basic shell. All processes and required dependencies need to be copied to this environment in order to run. This is a real containment method for Apache in that if an attacker breaks into a shell somehow, he has nowhere to go. The method to set up and configure this kind of jail is complex and requires research, depending on what software is required to run with the web application. To find out more detailed steps on how to set up this environment, see the “References & Further Reading” section at the end of this chapter.

Internal Chrooting

Internal chrooting is different from external chrooting in that during internal chrooting, the chroot is created from inside the Apache process. Apache starts out and initializes normally but then creates a chroot environment for the process to run. By default, Apache does not support this kind of chroot method. However, a couple of people have created third-party add-ons that enable Apache to support this.

• ModSecurity supports a chroot environment via its SecChrootDir configuration. Just set the value to the directory where you would like Apache to be jailed.

• ModChroot is an Apache module that works in the same manner as the ModSecurity chroot. Just set the ChrootDir to the proper directory.

• Apache chroot(2) patch by Arjan De Vet is an actual patch to Apache that enables support for internal chrooting.

PHP Best Practices

Since we discussed a number of vulnerabilities in the popular PHP scripting platform, here are a few tips on making sure you avoid them:

Apply strict input validation to all user input.
Use eval(), passthru(), system(), and other functions sparingly and without user input.
Turn register_globals off.

Common Security Options for PHP

The following configuration options are security-related and can be set in the php.ini file. Using these settings ensures that the PHP configuration you have running is securely set by default.

open_basedir

This setting will restrict any file access to a specified directory.

disable_functions

This allows a set of functions to be disabled in PHP. Disabling functions is considered a great way to practice defense in depth. If the applications don’t make use of security-risky functions such as eval(), passthru(), system(), etc., then add these as functions that should never be allowed.

expose_php

Setting this configuration to off will remove the PHP banner that displays in the server headers on an HTTP response.

display_errors

This setting is a simple but important configuration that enables detailed error information to be displayed to the user on an exception. This setting should always be turned off in any production environment.

safe_mode

Turning safe_mode on in PHP allows very strict file access permissions. It does this by checking the permissions of the owner of the PHP script that is running and any file access that the script attempts. If the permissions do not match, then PHP throws a security exception.

allow_url_fopen

This configuration option will disable the ability to do file operations on remote files. This is a nice overall setting to prevent remote file inclusion vulnerabilities from working.

Web Authentication Threats

Username/password Because of its simplicity, this is the most prevalent form of authentication on the Web.
Strong(er) authentication Since it’s widely recognized that username/ password authentication has fundamental weaknesses, many web sites are beginning to provide stronger forms of authentication for their users, including token- and certificated-based authentication.
Authentication services Many web sites outsource their authentication to Internet services.

Countermeasures for Password Guessing

The most effective countermeasure against password guessing is a combination of a strong password policy and a strong account lockout policy. After a small number of unsuccessful login attempts, the application should lock the account to limit the exposure to this type of attack.

A good compromise that many application developers choose is to temporarily lock the account for a small period of time, say ten minutes. This slows down the rate of password guessing, thereby hindering the effectiveness of password-guessing attacks.

Summary

Authentication plays a critical role in the security of any website with sensitive or confidential information.

A strong password policy and account lockout policy will render most attacks based on password guessing useless.
Ensure that all sections of the application requiring authentication are actually covered by the authentication component and that authentication cannot be bypassed by brute-forcing to the resource.
Do not use personally identifiable information for credentials! They aren’t really secret, and they expose your business to liability if you store them.
HTTPS should be used to protect authentication transactions from the risk of eavesdropping and replay attacks.
Input validation goes a long way in preventing hacking on a web site. SQL injection, script injection, and command execution can all be prevented if input validation is properly performed.
Ensure that authentication security tokens like session identifiers aren’t easily predictable and that they are generated using a sufficiently large key space that cannot easily be guessed.
Do not allow users to preset session IDs prior to authentication (the server should always generate these values), and always issue a new session ID upon successful authentication.
Do not forget to harden identity management systems like account registration and credential reset, as weaknesses in these systems can bypass authentication controls altogether.

Authorization Best Practices

We’ve covered a lot of web app authorization attacks. Now, how do we mitigate all of those techniques?

Apache Authorization

The Apache web server uses two different directives to control user access to specific URLs. The Directory directive is used when access control is based on file paths. For example, the following set of directives limits access to the /admin URL. Only valid users who are also in the admin group can access this directory.

Web Authorization/Session Token Security

Authorization/session management techniques best practices:

Use SSL. Any traffic that contains sensitive information should be encrypted to prevent sniffing attacks.
Mark cookies using the Secure parameter of the Set-Cookie response header, per RFC 2109.
Don’t roll your own authz.
Don’t include personally sensitive data in the token.
Regenerate session IDs upon privilege changes.
Enforce session time limits to close down the window for replay attacks.
Enforce concurrent login limits.
Perform strict input validation.

Security Logs

Another access control countermeasure that often gets overlooked is security logging. The web application’s platform should already be generating logs for the operating system and web server. Unfortunately, these logs can be grossly inadequate for identifying malicious activity or re-creating a suspect event. Many additional events affect the user’s account and should be tracked, especially when dealing with financial applications:

Profile changes Record changes to significant personal information such as phone number, address, credit card information, and e-mail address.
Password changes Record any time the user’s password is changed.
Optionally, notify the user at their last known good e-mail address. (Yahoo! does this, for example.)
Modify other user Record any time an administrator changes someone else’s profile or password information. This could also be triggered when other users, such as help desk employees, update other users’ information. Record the account that performed the change and the account that was changed.
Add/delete user Record any time users are added to or removed from the system.

The application should log as much detail as possible. Of course, there must be a balance between the amount and type of information logged. At a minimum, information that identifies the user who originated the request should be logged. This information includes the source IP address, username, and other identification tokens, date, and time the event occurred.

Common Input Injection Attacks

Let’s examine some common input validation attack payloads. Even though many of the attacks merely dump garbage characters into the application, other payloads contain specially crafted strings.

Buffer Overflow

Buffer overflows are less likely to appear in applications written in interpreted or high-level programming languages. For example, you would be hard-pressed to write a vulnerable application in PHP or Java.

To execute a buffer overflow attack, you merely dump as much data as possible into an input field. This is the most brutish and inelegant of attacks, but useful when it returns an application error. Perl is well suited for conducting this type of attack.

The rule of thumb for buffer overflow testing is to follow basic differential analysis or anomaly detection:

1. Send a normal request to an application and record the server’s response.

2. Send the first buffer overflow test to the application, and record the server’s response.

3. Send the next buffer, and record the server’s response.

4. Repeat step 3 as necessary.

Whenever the server’s response differs from that of a “normal” request, examine what has changed. This helps you track down the specific payload that produces an error (such as 7,809 slashes on the URL are acceptable, but 7,810 are not).

In some cases, the buffer overflow attack enables the attacker to execute arbitrary commands on the server. This task is more difficult to produce once, but simple to replicate. In other words, experienced security auditing is required to find a vulnerability and to create an exploit, but an unsophisticated attacker can download and run a premade exploit.

Canonicalization (dot-dot-slash)

These attacks target pages that use template files or otherwise reference alternate files on the web server. The basic form of this attack is to move outside of the web document root in order to access system files, i.e., “../../../../../../../../../boot.ini”. The actual server, IIS and Apache, for example, is hopefully smart enough to stop this. IIS fell victim to such problems due to logical missteps in decoding URL characters and performing directory traversal security checks. Two well-known examples are the IIS Superfluous Decode (..%255c..) and IIS Unicode Directory Traversal (..%c0%af..) vulnerabilities.

TIP

Many embedded devices, media servers, and other Internet-connected devices have rudimentary web servers—take a look at many routers and wireless access points sold for home networks. When confronted by one of these servers, always try a simple directory traversal on the URL to see what happens. All too often security plays second fiddle to application size and performance!

Navigating Without Directory Listings

Canonicalization attacks allow directory traversal inside and outside of the web document root.

Error codes can also help us enumerate directories. We’ll use information such as “Path not found” and “Permission denied” to track down the directories that exist on a web server.

Steps for enumerating files:

1. Examine error codes. Determine if the application returns different errors for files that do not exist, directories that do not exist, files that exist (but perhaps have read access denied), and directories that exist.

2. Find the root. Add directory traversal characters until you can determine where the drive letter or root directory starts.

3. Move down the web document root. Files in the web document root are easy to enumerate. You should already have listed most of them when first surveying the application. These files are easier to find because they are a known quantity.

4. Find common directories. Look for temporary directories (/temp, /tmp, /var), program directories (/Program Files, /winnt, /bin, /usr/bin), and popular directories (/home, /etc, /downloads, /backup).

5. Try to access directory names. If the application has read access to the directory, it will list the directory contents. This makes file enumeration easy!

Canonicalization Countermeasures

The best defense against canonicalization attacks is to remove all dots (.) from GET and POST parameters. The parsing engine should also catch dots represented in Unicode and hexadecimal.

Force all reads to happen from a specific directory. Apply regular expression filters that remove all path information preceding the expected filename. For example, reduce /path1/path2/./path3/file to /file.

Secure filesystem permissions also mitigate this attack.

HTML Injection

Script attacks include any method of submitting HTML-formatted strings to an application that subsequently renders those tags.

Embedded Scripts

Embedded script attacks lack the popularity of cross-site scripting, but they are not necessarily rarer. An XSS attack targets other users of the application. An embedded script attack targets the application itself. In this case, the malicious code is not a pair of script tags, but formatting tags. This includes SSI directives, ASP brackets, PHP brackets, SQL query structures, or even HTML tags. The goal is to submit data that, when displayed by the application, executes as a program instruction or mangles the HTML output. Program execution can enable the attacker to access server variables such as passwords and files outside of the web document root. Needless to say, an embedded script poses a major risk to the application. If the embedded script merely mangles the HTML output, then the attacker may be presented with source code that did not execute properly. This can still expose sensitive application data.

Execution tests fall into several categories. An application audit does not require complex tests or malicious code. If an injected ASP date() function returns the current date, then the application’s input validation routine is inadequate. ASP code is very dangerous because it can execute arbitrary commands or access arbitrary files:

Cookies and Predefined Headers

Web application testers always review cookie contents. Cookies, after all, can be manipulated to impersonate other users or to escalate privileges. The application must read the cookie; therefore, cookies are an equally valid test bed for script attacks. In fact, many applications interpret additional information that is particular to your browser. The HTTP 1.1 specification defines a User-Agent header that identifies the web browser. You usually see some form of “Mozilla” in this string.

HTML Injection Countermeasures

The most significant defense against script attacks is to turn all angle brackets into their HTML-encoded equivalents. The left bracket, <, is represented by < and the right bracket, >, is represented by >. This ensures the brackets are always stored and displayed in an innocuous manner. A web browser will never execute a script tag.

Some applications intend to let users specify certain HTML tags such as bold, italics, and underline. In these cases, use regular expressions to validate the data. These checks should be inclusive, rather than exclusive. In other words, they should only look for acceptable tags, permit those tags, and HTML-encode all remaining brackets. For example, an inadequate regular expression that tries to catch script tags can be tricked:

Boundary Checks

Numeric fields have much potential for misuse. Even if the application properly restricts the data to numeric values, some of those values may still cause an error. Boundary checking is the simple technique of trying the extremes of a value. Swapping out UserID=19237 for UserID=0 or UserID=-1 may generate informational errors or strange behavior. The upper bound should also be checked. A one-byte value cannot be greater than 255. A two-byte value cannot be greater than 65,535.

Manipulate Application Behavior

Some applications may have special directives that the developers used to perform tests. One of the most prominent is debug=1. Appending this to a GET or POST request could return more information about variables, the system, or backend database connectivity. A successful attack may require a combination of debug, dbg and true, T, or 1.

Search Engines

The mighty percent (%) often represents a wildcard match in SQL or search engines. Submitting the percent symbol in a search field might return the entire database content, or generate an informational error.

SQL also uses the underscore (_) to represent a single-character wildcard match. Web applications that employ LDAP backends may also be exposed to similar attacks based on the asterisk (*), which represents a wildcard match in that protocol.

SQL Injection

One very popular attack that targets an application’s backend database is SQL injection. SQL injection is a style of code injection. Unlike XSS code injection that typically uses JavaScript to target the browser, SQL injection targets the SQL statement being executed by the application on the backend database. This attack involves injecting SQL into a dynamically constructed query that is then run on the backend database. Most commonly, the malicious input is concatenated directly into a SQL statement within the application code but SQL injection can also occur within stored procedures. By injecting SQL syntax, the logic of the statement can be modified so it performs a different action when executed. A quick test on a user input field that is used to query a database is to send a single quotation mark on the end of the value. In SQL syntax, the single quote delimits the start or end of a string value. Thus, when the single quote is injected into a vulnerable SQL statement, it has the potential to disrupt the pairing of string delimiters and generate an application error, which indicates a potential SQL injection vulnerability.

SQL injection vulnerabilities may be found in any application parameter that influences a database query. Attack points include the URL parameters, POST data, and cookie values. The simplest way to identify a SQL injection vulnerability is to add invalid or unexpected characters to a parameter value and watch for errors in the application’s response. This syntax-based approach is most effective when the application doesn’t suppress error messages from the database. When such error handling is implemented (or some simple input validation is present), then vulnerabilities can also be identified through semantic techniques that test the application’s behavior to valid SQL constructs.

Syntax tests involve injecting characters into a parameter with the intent of disrupting the syntax of the database query. The goal is to find a character that generates an error when the query is executed by the database, and is then propagated back through the application and returned in the server’s response. We’ll start with the most common injection character, the single quote ('). Remember the single quote is used to delineate string values in a SQL statement. Our first SQL injection test looks like this:

Subqueries

Subqueries can retrieve information ranging from Boolean indicators (whether a record exists or is equal to some value) to arbitrary data (a complete record). Subqueries are also a good technique for semantic-based vulnerability identification. A properly designed subquery enables the attacker to infer whether a request succeeded or not.

The simplest subqueries use the logical AND operator to force a query to be false or to keep it true:

AND 1=1 AND 1=0

Now, the important thing is that the subquery be injected such that the query’s original syntax suffers no disruption. Injecting into a simple query is easy:

SELECT price FROM Products WHERE ProductId=5436 AND 1=1

More complex queries that have several levels of parentheses and clauses with JOINs might not be as easy to inject with that basic method. In this case, we alter the approach and focus on creating a subquery from which we can infer some piece of information. For example, here’s a simple rewrite of the example query:

SELECT price FROM Products WHERE ProductId=(SELECT 5436)

We can avoid most problems with disrupting syntax by using the (SELECT foo) subquery technique and expanding it into more useful tests. We don’t often have access to the original query’s syntax, but the syntax of the subquery, like SELECT foo, is one of our making. In this case, we need not worry about matching the number of opening or closing parentheses or other characters. When a subquery is used as a value, its content is resolved before the rest of the query. In the following example, we try to count the number of users in the default mysql.user table whose name equals “root”. If there is only one entry, then we’ll see the same response as when using the value 5436 (5435+1 = 5436).

Subqueries take advantage of complex SQL constructs to infer the value of a SELECT statement. They are limited only by internal data access controls and the characters that can be included in the payload.

COMMON COUNTERMEASURES

We’ve already covered several countermeasures during our discussion of input validation attacks. However, it’s important to reiterate several key points to stopping these attacks:

• Use client-side validation for performance, not security. Client-side input validation mechanisms prevent innocent input errors and typos from reaching the server. This preemptive validation step can reduce the load on a server by preventing unintentionally bad data from reaching the server. A malicious user can easily bypass client-side validation controls, so they should always be complemented with server-side controls.

• Normalize input values. Many attacks have dozens of alternate encodings based on character sets and hexadecimal representation. Input data should be canonicalized before security and validation checks are applied to them. Otherwise, an encoded payload may pass a filter only to be decoded as a malicious payload at a later step. This step also includes measures taken to canonicalize file- and pathnames.

• Apply server-side input validation. All data from the web browser can be modified with arbitrary content. Therefore, proper input validation must be done on the server, where it is not possible to bypass validation functions.

• Constrain data types. The application shouldn’t even deal with data that don’t meet basic type, format, and length requirements. For example, numeric values should be assigned to numeric data structures and string values should be assigned to string data structures. Furthermore, a U.S. ZIP code should not only accept numeric values, but also values exactly five-digits long (or the “ZIP plus four” format).

• Use secure character encoding and “output validation.” Characters used in HTML and SQL formatting should be encoded in a manner that will prevent the application from misinterpreting them. For example, present angle brackets in their HTML-encoded form (< and >). This type of output validation or character reformatting serves as an additional layer of security against HTML injection attacks. Even if a malicious payload successfully passes through an input filter, then its effect is negated at the output stage.

• Make use of white lists and black lists. Use regular expressions to match data for authorized or unauthorized content. White lists contain patterns of acceptable content. Black lists contain patterns of unacceptable or malicious content. It’s typically easier (and better advised) to rely on white lists because the set of all malicious content to be blocked is potentially unbounded. Also, you can only create blacklist patterns for known attacks; new attacks will fly by with impunity. Still, having a black list of a few malicious constructs like those used in simple SQL injection and cross-site scripting attacks is a good idea.

• Securely handle errors. Regardless of what language is used to write the application, error handling should follow the concept of try, catch, finally exception handling. Try an action; catch specific exceptions that the action may cause; finally exit nicely if all else fails. This also entails a generic, polite error page that does not contain any system information.

• Require authentication. In some cases, it may make sense to configure the server to require proper authentication at the directory level for all files within that directory.

• Use least-privilege access. Run the web server and any supporting applications as an account with the least permissions possible. The risk to an application susceptible to arbitrary command execution that cannot access the /sbin directory (where many Unix administrator tools are stored) is lower than a similar application that can execute commands in the context of the root user.

Bibliographic Information