Monday, June 28, 2010

Comparing web application scanners, part 2

A new paper has been published by UCSB analysing the performance of various web application vulnerability scanners, which the authors say is "the largest evaluation of web application scanners in terms of the number of tested tools ... and the class of vulnerabilities analyzed".
The authors created their own test application containing a wide variety of vulnerabilities and crawling challenges, and carried out what appears to be a very detailed and rigorous analysis of each scanner's performance against this application.
Scanners were scored based on their ability to identify different types of vulnerabilities in different scanning modes. The overall scores, together with the prices of each scanner, were as follows:
ScannerScore  Price
Grendel-Scan  3Free
In addition to these core results, the authors also drew the following conclusions:
  • There are whole classes of vulnerabilities that cannot be detected by the state-of-the-art scanners, including weak passwords, broken access controls and logic flaws.
  • The crawling of modern web applications can be a serious challenge for today’s web vulnerability scanners, due to incomplete support for common client-side technologies and the complex stateful nature of today's applications.
  • There is no strong correlation between price and capability, as some of the free or very cost-effective scanners performed as well as scanners that cost thousands of dollars.
I must say, I completely agree with these conclusions. Firstly, Burp Scanner was designed with a clear awareness of the kinds of issues that scanners can reliably look for. It seeks to automate everything that can be reliably automated, giving you confidence in its output, and leaving you to focus on the aspects of the job that require human experience and intelligence to deliver. Secondly, devising a fully automated crawler that provides comprehensive coverage of today's applications, with their widely varied technologies and stateful designs, is a Herculean task. Even the best crawlers fall very far short of this, and claiming otherwise only gives false reassurance that this key part of application testing can be left to a machine. Burp Spider does provide crawling capabilities, both active and passive, but this feature is designed to be used in tandem with manual application mapping, and human sense-checking of the coverage achieved and the requests that need to be scanned for vulnerabilities.
I was, of course, pleased to see this recognition of Burp Scanner's capabilities, and the above comparison of scanners' performance versus price should make interesting reading for anyone who is deciding which products to spend their money on. Rest assured, I'll be going through the raw results from this survey in detail, and looking at ways to make Burp even more effective.

Tuesday, June 22, 2010

Comparing web application scanners

Earlier this year, Larry Suto published a paper comparing web application vulnerability scanners. It contained plenty that was worthy of discussion, but I was particularly interested in what he said about Burp Scanner. Rather belatedly (I've been busy), here are my thoughts about this.

Larry ran each scanner against various test applications developed by other scan vendors for the purpose of showcasing their products. He ran each scanner in "point and shoot" mode (where it is given just the URL for the application) and also in "trained" mode (where it is manually shown which pages it is supposed to test). Larry then added up all of the vulnerabilities found by each scanner against all of the test applications.

Now, in contrast to the other scanners, Burp is not designed to be a "point and shoot" tool, where the user provides a URL and hits "go". Rather, it is designed to support hands-on penetration testing. What Larry calls "training" for the other scanners is the primary modus operandi for Burp. Therefore, I wasn't much interested in the "point and shoot" numbers for Burp, as they aren't applicable to its intended use.

After Larry had "trained" each scanner in the test applications, he reported the following numbers of vulnerabilities found by each scanner:

WebInspect 52

When I first saw these numbers, I was surprised that Burp came significantly behind some other products. Based on my own comparisons with these scanners, and on very widespread feedback from users, this did not ring true. My immediate thought was that Burp had not been "trained" properly on the test applications. Burp provides the user with very fine-grained control over what gets scanned. To ensure complete coverage of an application, you need to ensure that Burp scans every request - that is, every page, every form submission, every asynchronous client request, etc. I suspected that Larry had not made Burp scan all of the relevant application requests, and so had missed a lot of bugs.

I spent just a couple of hours running Burp against the test applications used in the survey, and got very different results. Simply by ensuring that Burp was actually scanning every relevant request, and doing nothing else to optimise its performance, I found that Burp performed significantly better:

WebInspect 52

This was a relief, and closer to my expectations of Burp's capabilities. Still, the most striking feature of the above numbers is the fact that NTOSpider appears to be ahead by a mile. This surprised many people, and led some to suggest that cheating or collusion had occurred. I doubt this - the reality is more mundane. When we drill down into Larry's raw data of the vulnerabilities found by each scanner, we find a few cases where NTO alone identifies XSS or SQL injection in a request containing a large number of parameters, and each parameter is counted as a separate vulnerability in the raw numbers. This might be reasonable if each parameter represented a different flavour of the vulnerability type, designed to establish scanners' ability to find different varieties of bugs. But this was not the case: each parameter manifested identical behaviour in these cases.

In some cases, NTO deserves credit for reporting issues where other scanners did not (for example, in a user registration page which required a different username in each request in order to uncover bugs in other parameters). Nevertheless, crudely summing the raw numbers in these cases has skewed the results quite misleadingly. If these duplicated issues with multiple parameters are consolidated, NTO's numbers come down into line with the other leading products.

I know that other scan vendors have also responded to Larry, and in some cases attacked his methodology or claimed unfair treatment. I think that Larry has had a decent stab at an inherently difficult task, and I don't think he deserves to be flamed about it. There is plenty of interesting and subtle analysis behind the headline numbers in his paper. But I do contend that the raw numbers are misleading and certainly don't reflect Burp Scanner's true capabilities.

I do actually have reason to thank Larry for what he has done. In the course of reperforming his analysis of Burp, I did identify a few cases where Burp was missing a trick, and have recently made some enhancements to the core scanning engine to rectify these (coming in release v1.3.06). After these revisions, Burp now correctly reports an additional 16 vulnerabilities within the test applications, which is good news for users of Burp.