This article discusses techniques to detect SQL Injection and Cross Site Scripting (CSS) attacks against your networks. There has been a lot of discussion on these two categories of Web-based attacks about how to carry them out, their impact, and how to prevent these attacks using better coding and design practices. However, there is not enough discussion on how these attacks can be detected. We take the popular open-source IDS Snort [ref 3], and compose regular-expression based rules for detecting these attacks. Incidentally, the default ruleset in Snort does contain signatures for detecting cross-site scripting, but these can be evaded easily. Most of them can be evaded by using the hex-encoded values of strings such as %3C%73%63%72%69%70%74%3E instead of <script>.
We have written multiple rules for detecting the same attack depending upon the organization's level of paranoia. If you wish to detect each and every possible SQL Injection attack, then you simply need to watch out for any occurrence of SQL meta-characters such as the single-quote, semi-colon or double-dash. Similarly, a paranoid way of checking for CSS attacks would be to simply watch out for the angled brackets that signify an HTML tag. But these signatures may result in a high number of false positives. To avoid this, the signatures can be modified to be made accurate, yet still not yield too many false positives.
Each of these signatures can be used with or without other verbs in a Snort signature using the pcre [ref 4] keyword. These signatures can also be used with a utility like grep to go through the Web-server's logs. But the caveat is that the user input is available in the Web server's logs only if the application uses GET requests. Data about POST requests is not available in the Web server's logs.
2. Regular Expressions for SQL Injection
An important point to keep in mind while choosing your regular expression(s) for detecting SQL Injection attacks is that an attacker can inject SQL into input taken from a form, as well as through the fields of a cookie. Your input validation logic should consider each and every type of input that originates from the user -- be it form fields or cookie information -- as suspect. Also if you discover too many alerts coming in from a signature that looks out for a single-quote or a semi-colon, it just might be that one or more of these characters are valid inputs in cookies created by your Web application. Therefore, you will need to evaluate each of these signatures for your particular Web application.
As mentioned earlier, a trivial regular expression to detect SQL injection attacks is to watch out for SQL specific meta-characters such as the single-quote (') or the double-dash (--). In order to detect these characters and their hex equivalents, the following regular expression may be used:
2.1 Regex for detection of SQL meta-characters
We first detect either the hex equivalent of the single-quote, the single-quote itself or the presence of the double-dash. These are SQL characters for MS SQL Server and Oracle, which denote the beginning of a comment, and everything that follows is ignored. Additionally, if you're using MySQL, you need to check for presence of the '#' or its hex-equivalent. Note that we do not need to check for the hex-equivalent of the double-dash, because it is not an HTML meta-character and will not be encoded by the browser. Also, if an attacker tries to manually modify the double-dash to its hex value of %2D (using a proxy like Achilles [ref 5]), the SQL Injection attack fails.
The above regular expression would be added into a new Snort rule as follows:
alert tcp $EXTERNAL_NET any -> $HTTP_SERVERS $HTTP_PORTS (msg:"SQL Injection - Paranoid"; flow:to_server,established;uricontent:".pl";pcre:"/(\%27)|(\')|(\-\-)|(%23)|(#)/i"; classtype:Web-application-attack; sid:9099; rev:5;)
In this case, the uricontent keyword has the value ".pl", because in our test environment, the CGI scripts are written in Perl. Depending upon your particular application, this value may be either ".php", or ".asp", or ".jsp", etc. From this point onwards, we do not show the corresponding Snort rule, but instead only the regular expressions that are to be used for creating these rules. From the regular expressions you can easily create more Snort rules.
In the previous regular expression, we detect the double-dash because there may be situations where SQL injection is possible even without the single-quote [ref 6]. Take, for instance, an SQL query which has the where clause containing only numeric values. Something like:
select value1, value2, num_value3 from database
In this case, the attacker may execute an additional SQL query, by supplying an input like:
3; insert values into some_other_table
Finally, pcre modifiers 'i' and 'x' are used in order to match without case sensitivity and to ignore whitespaces, respectively.
The above signature could be additionally expanded to detect the occurrence of the semi-colon as well. However, the semi-colon has a tendency to occur as part of normal HTTP traffic. In order to reduce the false positives from this, and also from any normal occurrence of the single-quote and double-dash, the above signature could be modified to first detect the occurrence of the = sign. User input will usually occur as a GET or a POST request, where the input fields will be reflected as:
Therefore, the SQL injection attempt would result in user input being preceded by a = sign or its hex equivalent.
2.2 Modified regex for detection of SQL meta-characters
This signature first looks out for the = sign or its hex equivalent (%3D). It then allows for zero or more non-newline characters, and then it checks for the single-quote, the double-dash or the semi-colon.
A typical SQL injection attempt of course revolves around the use of the single quote to manipulate the original query so that it always results in a true value. Most of the examples that discuss this attack use the string 1'or'1'='1. However, detection of this string can be easily evaded by supplying a value such as 1'or2>1--. Thus the only part that is constant in this is the initial alphanumeric value, followed by a single-quote, and then followed by the word 'or'. The Boolean logic that comes after this may be varied to an extent where a generic pattern is either very complex or does not cover all the variants. Thus these attacks can be detected to a fair degree of accuracy by using the next regular expression, in section 2.3 below.
2.3 Regex for typical SQL Injection attack
\w* - zero or more alphanumeric or underscore characters
(\%27)|\' - the ubiquitous single-quote or its hex equivalent
(\%6F)|o|(\%4F))((\%72)|r|(\%52) - the word 'or' with various combinations of its upper and lower case hex equivalents.
The use of the 'union' SQL query is also common in SQL Injection attacks against a variety of databases. If the earlier regular expression that just detects the single-quote or other SQL meta characters results in too many false positives, you could further modify the query to specifically check for the single-quote and the keyword 'union'. This can also be further extended to other SQL keywords such as 'select', 'insert', 'update', 'delete', etc.
2.4 Regex for detecting SQL Injection with the UNION keyword
(\%27)|(\') - the single-quote and its hex equivalent
union - the keyword union
Similar expressions can be written for other SQL queries such as >select, insert, update, delete, drop, and so on.
If, by this stage, the attacker has discovered that the Web application is vulnerable to SQL injection, he will try to exploit it. If he realizes that the back-end database is on an MS SQL server, he will typically try to execute one of the many dangerous stored and extended stored procedures. These procedures start with the letters 'sp' or 'xp' respectively. Typically, he would try to execute the 'xp_cmdshell' extended procedure, which allows the execution of Windows shell commands through the SQL Server. The access rights with which these commands will be executed are those of the account with which SQL Server is running -- usually Local System. Alternatively, he may also try and modify the registry using procedures such as xp_regread, xp_regwrite, etc.
2.5 Regex for detecting SQL Injection attacks on a MS SQL Server
exec - the keyword required to run the stored or extended procedure
(\s|\+)+ - one or more whitespaces or their HTTP encoded equivalents
(s|x)p - the letters 'sp' or 'xp' to identify stored or extended procedures respectively
\w+ - one or more alphanumeric or underscore characters to complete the name of the procedure
3. Regular Expressions for Cross Site Scripting (CSS)
When launching a cross-site scripting attack, or testing a Website's vulnerability to it, the attacker may first issue a simple HTML formatting tag such as <b> for bold, <i> for italic or <u> for underline. Alternatively, he may try a trivial script tag such as <script>alert("OK")</script>. This is likely because most of the printed and online literature on CSS use this script as an example for determining if a site is vulnerable to CSS. These attempts can be trivially detected. However, the advanced attacker may attempt to camouflage the entire string by entering its Hex equivalents. So the <script> tag would appear as %3C%73%63%72%69%70%74%3E. On the other hand, the attacker may actually use a Web Application Proxy like Achilles and reverse the browser's automatic conversion of special characters such as < to %3C and > to %3E. So the attack URL will contain the angled brackets instead of their hex equivalents as would otherwise normally occur.
The following regular expression checks for attacks that may contain HTML opening tags and closing tags <> with any text inside. It will catch attempts to use <b> or <u> or <script>. The regex is case-insensitive. We also need to check for the presence of angled brackets, as well as their hex equivalents, or (%3C|<). To detect the hex conversion of the entire string, we must check for the presence of numbers as well as the % sign in the user input, in other words, the use of [a-z0-9%]. This may sometimes result in false-positives, but most of the time will detect the actual attack.
3.1 Regex for simple CSS attack
((\%3C)|<) - check for opening angle bracket or hex equivalent
((\%2F)|\/)* - the forward slash for a closing tag or its hex equivalent
[a-z0-9\%]+ - check for alphanumeric string inside the tag, or hex representation of these
((\%3E)|>) - check for closing angle bracket or hex equivalent
alert tcp $EXTERNAL_NET any -> $HTTP_SERVERS $HTTP_PORTS (msg:"NII Cross-site scripting attempt"; flow:to_server,established; pcre:"/((\%3C)|<)((\%2F)|\/)*[a-z0-9\%]+((\%3E)|>)/i"; classtype:Web-application-attack; sid:9000; rev:5;)
Cross-site scripting can also be accomplished by using the <img src=> technique. The existing default snort signature can be easily evaded. The one supplied in section 3.2 will be much tougher to evade.
3.2 Regex for "<img src" CSS attack
(\%3C)|<) opening angled bracket or hex equivalent
(\%69)|i|(\%49))((\%6D)|m|(\%4D))((\%67)|g|(\%47) the letters 'img' in varying combinations of ASCII, or upper or lower case hex equivalents
[^\n]+ any character other than a new line following the <img
(\%3E)|>) closing angled bracket or hex equivalent
3.3 Paranoid regex for CSS attacks
This signature simply looks for the opening HTML tag, and its hex equivalent, followed by one or more characters other than the newline, and then followed by the closing tag or its hex equivalent. This may end up giving a few false positives depending upon how your Web application and Web server are structured, but it is guaranteed to catch anything that even remotely resembles a cross-site scripting attack.
For an excellent reference on types of cross-site scripting attacks that will evade filters, see the Bugtraq posting http://www.securityfocus.com/archive/1/272037. However, note that the last of the cross-site scripting signatures, which is the paranoid signature, will detect all these attacks.
In this article, we've presented different types of regular expression signatures that can be used to detect SQL Injection and Cross Site Scripting attacks. Some of the signatures are simple yet paranoid, in that they will raise an alert even if there is a hint of an attack. But there is also the possibility that these paranoid signatures may result in false positives. To take care of this, we've then modified the simple signatures with additional pattern checks so that they are more accurate. We recommend that these signatures be taken as a starting point for tuning your IDS or log analysis methods, in the detection of these Web application layer attacks. After a few modifications, and after taking into account the non-malicious traffic that occurs as part of your normal Web transactions, you should be able to accurately detect these attacks.