Tuesday, June 22, 2010

Comparing web application scanners

Earlier this year, Larry Suto published a paper comparing web application vulnerability scanners. It contained plenty that was worthy of discussion, but I was particularly interested in what he said about Burp Scanner. Rather belatedly (I've been busy), here are my thoughts about this.

Larry ran each scanner against various test applications developed by other scan vendors for the purpose of showcasing their products. He ran each scanner in "point and shoot" mode (where it is given just the URL for the application) and also in "trained" mode (where it is manually shown which pages it is supposed to test). Larry then added up all of the vulnerabilities found by each scanner against all of the test applications.

Now, in contrast to the other scanners, Burp is not designed to be a "point and shoot" tool, where the user provides a URL and hits "go". Rather, it is designed to support hands-on penetration testing. What Larry calls "training" for the other scanners is the primary modus operandi for Burp. Therefore, I wasn't much interested in the "point and shoot" numbers for Burp, as they aren't applicable to its intended use.

After Larry had "trained" each scanner in the test applications, he reported the following numbers of vulnerabilities found by each scanner:

WebInspect 52

When I first saw these numbers, I was surprised that Burp came significantly behind some other products. Based on my own comparisons with these scanners, and on very widespread feedback from users, this did not ring true. My immediate thought was that Burp had not been "trained" properly on the test applications. Burp provides the user with very fine-grained control over what gets scanned. To ensure complete coverage of an application, you need to ensure that Burp scans every request - that is, every page, every form submission, every asynchronous client request, etc. I suspected that Larry had not made Burp scan all of the relevant application requests, and so had missed a lot of bugs.

I spent just a couple of hours running Burp against the test applications used in the survey, and got very different results. Simply by ensuring that Burp was actually scanning every relevant request, and doing nothing else to optimise its performance, I found that Burp performed significantly better:

WebInspect 52

This was a relief, and closer to my expectations of Burp's capabilities. Still, the most striking feature of the above numbers is the fact that NTOSpider appears to be ahead by a mile. This surprised many people, and led some to suggest that cheating or collusion had occurred. I doubt this - the reality is more mundane. When we drill down into Larry's raw data of the vulnerabilities found by each scanner, we find a few cases where NTO alone identifies XSS or SQL injection in a request containing a large number of parameters, and each parameter is counted as a separate vulnerability in the raw numbers. This might be reasonable if each parameter represented a different flavour of the vulnerability type, designed to establish scanners' ability to find different varieties of bugs. But this was not the case: each parameter manifested identical behaviour in these cases.

In some cases, NTO deserves credit for reporting issues where other scanners did not (for example, in a user registration page which required a different username in each request in order to uncover bugs in other parameters). Nevertheless, crudely summing the raw numbers in these cases has skewed the results quite misleadingly. If these duplicated issues with multiple parameters are consolidated, NTO's numbers come down into line with the other leading products.

I know that other scan vendors have also responded to Larry, and in some cases attacked his methodology or claimed unfair treatment. I think that Larry has had a decent stab at an inherently difficult task, and I don't think he deserves to be flamed about it. There is plenty of interesting and subtle analysis behind the headline numbers in his paper. But I do contend that the raw numbers are misleading and certainly don't reflect Burp Scanner's true capabilities.

I do actually have reason to thank Larry for what he has done. In the course of reperforming his analysis of Burp, I did identify a few cases where Burp was missing a trick, and have recently made some enhancements to the core scanning engine to rectify these (coming in release v1.3.06). After these revisions, Burp now correctly reports an additional 16 vulnerabilities within the test applications, which is good news for users of Burp.


The Ubiquitous Mr. Lovegroove said...

Frankly, I think burp suite is best value for money on the web. I've been _VERY_ happy with it.
The one point-and-shoot thing that could be implemented is netsparker's vuln confirmation ability, which is impressive.

kroyster said...

I suspected the same problems in the original analysis. Also noted in the original report... it says that Burp Pro doesn't support automated form population, which isn't true. And it notes that Burp Pro doesn't support JavaScript, which *might* be true and a problem for spidering JavaScript navigation, but it is not a problem for manual navigation - the intended use of Burp. On a positive note, it noted Burp had the fastest scan times.

Mephisto said...

Did anyone ever ask the question if Larry actually knew how to configure these scanners for optimal performance and discovery? I find it somewhat hard to believe that Larry knew the purpose of every possible setting and their potential impact on the results, in all of the identified software scanners.

informatiocautela said...

I bought Burp for myself because of Larry's report. The real test is the results. So I believe you need to ask questions like: "Can you find the majority of the problems that plague poor coding practices in web applications with a tool", "How much time is (not) wasted validating findings"? Burp does an awesome job for someone who is a trust but verify type of person!

Daniel said...

Daf, or indeed anyone else in the community,

maybe what could be helpful in this situation is a blog post/guide on how to tune burp to ensure it operates as expected.

Those of us who use it on a regular basis know how to do this but many it seems don't.

pierz said...

Very good post, I did not take the time to read this web scanner test because the NTOSpider score was by far too high and that's the reason why I thought this report was a big fake. Now I understand what was wrong. Every Web scanner have approximately the same skill, the difference is the price.

Anonymous said...

After reading the whitepaper I contacted NTObjectives for a trial version convinced that I was going to get it as "it must perform much better than anything else".. Apparently this whitepaper got many many other people to do the same thing and it was indeed one of the main selling points of the presentation. However, after running it on a real life application it performed unbelievably terrible. Accunetix did slightly better but Burp was the only app that found the major vuln's and in a much shorter period of time. So while the NTOSpider GUI looks nice and they claim it should work better, I was never able to get it to work as claimed... and couldn't justify the extremely pricy price tag... so I am happy to stick with Burp for now..

Morale of the story is don't believe everything you read about...