Today I’m writing about the length extension attack. It could be used in many applications so I felt that developers who are reading this blog should get to know how this works.
You have a site that allows users to upload files and make them available to their customers. When a user logs in to view resources you give these users access to certain folders on your site and place it in a cookie. You decide that the cookie should have a signature for security 😉 .
The signature is a salted sha1 hash of the resource in the following format.
Signature = SHA1(KEY||PERMISSIONS)
Before showing the user a file you verify that they have access to the resource by calculating the signature and then verify that the signature matches what they passed to you in the cookie.
Now why would you do this instead of encrypting the whole damn thing? Simple… Encryption does not provide integrity, meaning from a programming standpoint it’s safer. Because if an encrypted cookie is tampered with your application can fail in the decryption throwing, exceptions and access violations. I have seen this lead to other notorious things like buffer overruns. Hashing data that is passed, even if tampered with is much safer than decrypting a value that has been tampered with. If you need confidentiality (meaning something in the cookie is secret) then by all means encrypt the cookie, you’ll still however need a MAC to validate the cookie (see Padding oracle attack against ASP.NET).
The Problem With The Approach
Lets look at a generic hash function….we’ll call it H. Pretend it’s this fantastic function below.
It’s very simple you see it takes some fixed length doodoo in and spits some fixed length doodoo out, the same input value will always have the same corresponding output value AND I know exactly how long that output is going to be every single time.
My hash function operates on fixed lengths (blocks), this isn’t going to work…we need it to work on arbitrary lengths. Now how do we achieve this? Well the Merkle–Damgård construction (MD5,SHA1,SHA2) splits our hash function into multiple steps, it operates on fixed length blocks chained together.
There are two inputs, our block and the hash of the previous block. For The first block there is an initialization vector which just gives you a starting point. By looking at this picture it’s easy to see that this construction is a simple state machine. You can take any of the outputs from any of the blocks and you will get the internal state of that machine.
What does this mean my nerdy friend?
Lets pretend that my hash function operates on blocks of characters and not blocks of bytes. Lets also pretend that a single block is the length of 1 character. My Hash function H now has two inputs to generate its output. Therefore:
H(IV,ABC) = H(H(IV,AB),C)
If I were to do the hash of AB and ABC the internal state would be exactly the same until the last letter (C) is processed (makes sense right?)
Lets look at that cookie again
Lets say your user logged in and they only have access to A_RESOURCE.
Lets pretend that user turns out to be malicious, they want some free shit like B_RESOURCE. What can they do? Well if they know the salt (key) then no problem, but we’re going to assume your attacker doesn’t have access to the key. If they do then you have bigger problems and you should stop reading right about now and go try to figure out WTF happened to your shit.
My attacker wants to get access to B_RESOURCE so they need to generate a signature without knowing what the key is. Now Remember the previous formula.
H(IV,ABC) = H(H(IV,AB),C)
Treat our current signature as our hash functions internal state. So therefore:
H(IV,KEY||A_RESOURCE||B_RESOURCE) = H(H(IV,KEY||A_RESOURCE),B_RESOURCE)
In other words
NewSignature = H(Signature,B_RESOURCE)
Relax It’s Not Exactly That Simple
Before you go freaking out…It’s not exactly that simple, you’re not going to just take your signature and add your B_RESOURCE to it. This attack only works if your signature is in a certain format, for example this wouldn’t always* work if the signature incorporated the timestamp at the end of the hash like so:
Signature = H(KEY||A_RESOURCE||TimeStamp)
Because it would be difficult to generate a message with the new permissions in the middle.
Signature = H(KEY||A_RESOURCE||B_RESOURCE||TimeStamp)
You also need to know how long the salt is to construct your new message because you need to know how much padding was added, but this isn’t particularly difficult to find out (trial and error). This attack is called the length extension attack, and regardless of how simple or complex it is there are tools out there that allow attackers to perform length extension attacks against your application. It’s not beyond a determined attacker to do so, especially if you have data that has some value.
This won’t happen to me…
Oh Ye of little faith…. It happened to Flickr… nuff said…
There are a lot of crappy solutions to this problem and a good solution. I’m just going to give you the good solution and it’s really simple… Don’t use a straight up hash function with salt, instead use the HMAC construction of that hash function. HMAC is resistant to the length extension attack as well as other problems that plague these cryptographic hash functions…. and guess what! It’s made specifically for this type of scenario, so why reinvent the wheel? There are many libraries that use the HMAC construction and if you use the .NET framework for your work you can simply use the HMAC construction available within the System.Security.Cryptography library. SHA3 (Keccak) is also resistant to the length extension attack without the HMAC construction, but it’s relatively new (it was accepted as the standard in October 2012) so it will take a little time before it’s adopted in libraries out there.
*Putting something at the end wouldn’t always work but in some cases and with some of the development practices out there it is plausable that you could extend the message after the timestamp and it would still result in a valid signature and permissions. You’ve been warned…