Coder in a World of Code: November 2012

Saturday 17 November 2012

Suicide

Anger

Long story short, I was in a queue for food in the cafeteria at work, with a postgrad behind me. She was loudly wittering on the phone to someone, complaining about everything. I wrote her off as just a little bit self centered and left it at that.

Right up until she said her train was late, "because some woman was walking on the tracks" shortly followed by "If they want to kill themselves, we should just let them!"

At that point, I started to get angry. However, i did not say anything. I bit my tongue.

For reference, this blog post is not directed at those with mental health issues, per-se. It is more directed at those who judge them.

Just Let Them

This is the feeling that a lot of the general populace have about people with mental health issues. It is, at it's core, an argument based on a fundamental mis-conception about mental health issues and the people who suffer with them. Not only is the argument flawed, it also precludes the option for treatment in so many cases.

As a quick disclaimer, I am not a mental health care professional, and I do not suffer with any mental health issues to the best of my knowledge. I therefore fill the middle ground; the lay folk, the ones farthest from the problems in most cases. Despite this, I've had a lot of exposure to mental health issues. From close friends and lovers suffering with bipolar disorder, PTSD, anxiety, depression, self-harm paranoia and suicidal tendencies to actively working on a helpline during my undergraduate days.

The Faulty Premise

There is one core, faulty premise at the heart of the "just let them" argument. That premise is that people with mental health issues somehow chose their suffering. That is to say, they're either responsible for starting the illness, responsible for not fixing it, or somehow deserve it because of how the illness makes them behave.

These people are usually suffering because of a genetic pre-disposition; which they didn't choose; environmental conditions during their childhood; which they didn't choose; or a traumatic experience during their lifetime; often which they did not choose.

Notice how not one of those criteria is something that was chosen by the sufferer.

What this means is that the sufferers are not choosing to end their lives. They are choosing to end their suffering. Suicide is their last way out that they can see. If everyday you were in agony, and your doctors had told you that they'll see what they can do, but it might never work, wouldn't you look for other options?

I'm choosing to ignore the "what about the sufferer's family and friends?" retort here, as it devalues those who have no family or friends. Think of an old person who has outlived their friends and family, or many children who are in care. Those are some of the most vulnerable in our society, and to devalue them in such a way is to put them at further risk.

Treatment

Since the "just let them" argument simply shifts the blame to the sufferer, it's not difficult to make they leap to "they did it to themselves, they can fix it themselves".

If I became injured, the NHS would treat me to the best of their abilities. Even if it was in a car accident that was my fault, the NHS would patch me up. I'm not precluded from treatment because of how I became injured. I would even get on-going physiotherapy if I needed it. Why is this ethos not extended to those with mental health issues?

Epilogue

I'd like to state that I am not against euthanasia. If this seems like a contradiction, I'll explain it. I believe that every person has the right to decide their own fate. However, I feel that the best way to deal with mental health issues is preventative. Ensure that the person has the best treatment available to them to improve their quality of life.

Just because a person's quality of life has fallen, doesn't mean that it has to remain low -- there may be an option to improve their quality of life. However, if it is not an option, or it is not working, then the person should be able to decide their own fate, safely, quickly, as painlessly as possible.

Wednesday 7 November 2012

OATH Analysis

Overview

Here, we go over the OATH specification, and point out some potential problems.

The OATH specification details a couple of methods of producing One-Time Passwords (OTPs) and verifying them.

I am going to leave the significantly more complex OCRA challenge/response algorithm to another blog post here.

HOTP Algorithm

The HMAC-Based One-Time Password (HOTP) scheme allows for the generation of OTPs that can be verified by a server. They are only properly suited for authenticating to one verifier, as they contain a counter which can become out of sync between the HOTP generator and the verifier.

The algorithm is as follows; the generator has stored on it a key, K and a counter C. In order to generate an OTP, the generator produces HMAC-SHA-1(K,C), truncates it to not less than 6 decimal digits and shows it. Mathematically, this can be expressed as:

HOTP(Key, Counter) = StringToNumber(DynamicTruncate(HMAC-SHA-1(K, C)))

DyanmicTruncate is an interesting function. It takes the last nibble (That is the bitwise-and of the final byte and 0xf) and uses it to jump somewhere into the output of HMAC-SHA-1(K,C), select 4 sequential bytes and treats them as a 32-bit integer (with the bitwise and of the most significant byte and 0x7F).

StringToNumber then computes the modulus of the dynamic truncation mod 10^{Digits} where Digits is the number of digits you want the output to be. Digits is expected to be 6, 7, or 8.

Analysis

The key is a shared secret and requires the shared secret to retrievable as plain text. This suffers from many of the same issues which are covered in a previous blog post. Primarily credential leak.

The agorithm isn't stateless with respect to it's previous executions, that is to say, if the counter gets out of sync, the server should stop accepting the OTPs.

TOTP Algorithm

The Time-based One-Time Password (TOTP) scheme is vastly the same as the HOTP algorithm, except T is a time-based parameter. Often constructed so that it updates once a minute. Mathematically, this is:

TOTP(K) = HOTP(K,T)

T = floor((CurrentUnixTime - T0 / TimeStep))

Where T0 is the time to start counting steps. One can tweak the StringToNumber function to give 6 to 8 digits, just the same as in the HOTP scheme.

Analysis

There is no counter based on the number of times the algorithm has been executed, which means that, given the key is shared between multiple verifiers, they can all independently verify the OTPs without needing to sync their counters. This is nice in some ways, but very bad in others, as it could be seen to be an encouragement to share keys.

There is also the issue of needing an OTP twice during the same time slice. What becomes of the old one? Do you just accept it? This makes it susceptible to fast replay or MITM attacks, as a person who realises they've given away a key they shouldn't have must wait for the key to become invalid, rather than being able to manually invalidate it.

Conclusion

Broadly speaking, criticisms that can be levelled at one can be levelled at the other.

Output Format

The output format that was decided on in the introductory parts of the paper makes some sense, a 6 to 8 digit OTP can be entered on almost any device, and 6 to 8 digit LCD displays are cheap, making it possible to produce these devices in hardware cheaply.

It also faciliates an 'air-gap' device, which is where the code is displayed on a screen. A human can be expected to reliably type 6 to 8 digits. a 50 character digital signature not so much. This means that the OTPs can be used every where. Home computers, laptops, touch screen devices, mobile phones, locked-down internet cafes with no USB ports, and so on.

However, such a limited output format completely precludes the possibility of a stronger form of authentication. For example, outputting an ID, counter/timestamp and a digital signature would be very good, and prevent the need for shared secrets.

Shared-Secrets

If you read this blog regularly, you are probably well aware that I find the idea of shared-secrets to be quite un-nerving.

Not only that, but the requirement to have the keys in plain text for computing HMAC-SHA-1(K,C) is somewhat problematic. You can't protect them cryptographically. What that means is that for serious institutions, they would have to invest in a hardware key storage module, that is tamper-resistant. This makes a wide deployment of these for a high-security system slightly more expensive, and any software would need to interface with it.

This means that for most purposes, you should generate a new key for every service that you wished to authenticate against, as no-one can reliably be trusted to protect your key. This is no better than needing a unique pass phrase for each service that you authenticate with in many respects. Yes, the secret will be hard to guess, but spilling it will be awful.

I can envision a smart phone app which supports OATH. It would store a unique key for every service that you want to authenticate with, and only provide an OTP from an encrypted keystore when a master pass phrase was presented. I am tempted to write one just for the hell of it and get back into Programming Android.

Such an application would be great as a second layer of authentication for most services.

Other Attacks

Brute-Force

Given that the shared-secrets are recommended to be 160 bits, brute forcing the secret is not likely to be an option.

However, if a server does not rate limit my guesses, I only have to enumerate 10^{Digits} worth of codes to have a verification server deem my OTP valid. Typically, this is only 1,000,000 requests to a server, which modern hardware and can do easily -- especially with a botnet.

Replay

Broadly speaking, a correctly implemented HOTP verification server is not vulnerable to a replay attack. However, a TOTP verification server may be.

Man-in-the-Middle (MITM)

All of the OTPs are vulnerable to relatively quick (MITM) attacks. Phising attacks and such are a real danger to this system.

Overall

Overall, I would say that an HOTP/TOTP capable device was more versatile than a YubiKey, but also less secure, as it is expensive for an organisation to protect their shared secrets, and shared secrets cannot be cryptographically hashed to secure them.

Hopefully, OCRA may solve some of the above problems without impeding the usability of the device, but I've yet to look into it..

Saturday 3 November 2012

Shared-Secret Authentication

Overview

Authentication is difficult. Barring the issues of lost passwords, weak passwords, the philosophical concept of "identity", and a host of other issues, we'll be looking at performing authentication "correctly" for users.

Issues

Credential Leaks

This is a simple attack. An attacker somehow gets their grubby little mitts on a set of (username, secret) and the gig is up.

But even if you're storing a hash of the user's secret, there are still pit-falls.

Timing Attacks

Timing attacks are often overlooked, but they're usually only a problem for high-value targets. They almost always comprise a high level of knowledge about the internal system, but can be used to effectively do things like enumerate all of the usernames on your system.

This can be a problem, what if, like Facebook, your username is the user's email? This is bad, because it means that you are responsible for leaking the data to an attacker who may just add them to a spam list, or use it as another step in an attack on a user.

Online Brute-force Attacks

These are the simplest to imagine. Just start guessing the credentials and sending them to the server for authentication. You have to know nothing about the system to do this, with the exception of where it is.

These attacks can become more sophisticated, by using a botnet to distribute the authentication requests around the world, and by statistically guessing credentials. When combined with a timing attack to enumerate usernames and good statistical guessing of passwords, this actually becomes quite a formidable threat.

Replay Attacks

A replay attack is where an attacker captures a valid (usually encrypted) request to a system. The system doesn't distinguish one request from another, so when the attacker re-issues the request, the system gladly carries out the request again. This could be as simple as logging the user in, or moving money from one account to another.

Defences

Credential Leaks

These come in many forms.

You leak your database, and the shared-secrets (i.e. passwords) are not protected by a hash.
You leak your database, and the shared-secrets are protected by a weak hash algorithm.
The credentials passed in plain text over the wire on the way to your server.
The credentials might be extracted from a "secure" tunnel.

Given the broad nature of this issue, I will issue a few simple golden rules.

Ensure that any shared secrets are stored in your database using a strong hashing algorithm. I recommend scrypt and a reasonable password policy. With reasonable parameters, this ensures that any attacker has to spend a lot of cash to break an scrypt hashed password. This covers the first two bullet points above.

And don't forget, that when hashing the password, you need to hash it with a salt which is cryptographically random and fairly big.

For the second two on a web application, always use SSL, and turn compression off. Do not store shared secrets on the client in cookies or anywhere else in the client. If you're not writing a web application, ensure that you use something equivalent to SSL on your connection between the server and the client. If you can't have this, you're going to need something other than shared-secrets for your authentication.

In general (for a web application) you want to ensure that your application is immune to XSS attacks, as it may actually be very easy for an attacker to use your site to send credentials elsewhere. This could be done by injecting a JavaScript keylogger into an otherwise perfectly valid page. If you're doing all this over SSL, then the subject of the attack will fully trust all of the script on the page to be from you, not an attacker, and will probably let the keylogger do it's business. Not cool.

Timing Attacks

This is quite feasible. Given a login procedure something along these lines:

Get the user's hash from the database. If there is no hash, return false.
Hash the given secret and compare it to the one in the database. Return the appropriate value.

Given that a good hash function should take ~100ms per password hash, an attacker can send hundreds of username guesses and random passwords down the wire. The attacker then measures the response time. The ones with a large response time are the ones which had their random passwords hashed, which means that a username was present.

There are several ways to defend, the most common is to attempt to engineer your solution to take the same amount of time regardless of the path it takes. This conflates your authentication code with code which tries to defeat a timing attack.

A better method is to leave your authentication code as-is, and wrap your authentication in an object which forces all requests to have the same average response time, regardless of whether or not the authentication was successful or not.

This has the benefit that, if your authentication process needs to query an external system, as well as internal systems, the external system need not be timing-attack resilient, and can be as simple as possible. It also keeps any code for checking locally-managed passwords simple.

The wrapper would simply keep a track of the average time that it took to perform a successful authentication. In the event that the authentication was unsuccessful, the wrapper would force the authentication to take the average time on total. Some code which has this idea can be found on my GitHub.

The source code above is in Java, and can be built using

ant dist

and run using

java -jar ./dist/lib/TimingAttack.jar

It will print out a series of authentication results and times to reach the result. You should see that the time on each should be (on average) the same, regardless of the result.

This doesn't protect against other side-channel attacks. If, for example, a malicious application was running on the same CPU as your application, there are a host of attacks (e.g. cache-miss side-channel attacks) that can leak information about what your application is doing.

Online Brute-force Attacks

This is actually a very tough nut to crack. If we assume that the attacker is as sophisticated as possible; getting a botnet to do statistically very good guessing for them; then we have a serious problem on our hands.

The first way to help deal with this is to force (or encourage) your users to pick very strong, difficult to guess passwords.

The second way is to rate limit any guesses. In general, this is A Good Thing^TM as it prevents an attacker from using up system resources at an unreasonable rate, and thus protects against a simple denial of service attack.

The second way could be achieved by keeping a global (for Java, read Singleton) RateLimiter class which keeps a record of IPs and the most recent time they sent a request in. If it's been too recent, make them wait a long time.

An example of such a rate limiting can be found in the same GitHub repository as the previous example. It can be run in a very similar way, first build it:

ant dist

and run using

java -jar ./dist/lib/OnlineBruteForceAttack.jar

A potential modification would be to slow down the request for each user account individually, as opposed for each IP address. That way, each IP gets some number of guesses that get a reasonable response time for each user, thus helping any folks with NATs. However, this means that an attacker can DoS a user. Some combination of IP Address and account would be in order, but even that won't slow an attacker down much.

Replay Attacks

These are usually dealt with by correctly using SSL. SSL uses nonces (cryptographically strong random numbers) to ensure that two requests can be distinguished.

In a system which does not use SSL, you basically have to implement nonces yourself. This is not hard, but I won't cover it here. Basically you stick a random number into the request. If you've seen that number previously, discard it. If you didn't generate that number, discard it.

Conclusion

The "best" [Edit: shared-secret based] authentication system requires good passwords (for some value of good), hashes them with a really good hashing algorithm (my opinion is that a correctly-configured scrypt is the way forward), and defends against timing-attacks and simple brute-force attacks is the best.

However, you need to balance this against the needs of your user. How sensitive is the data you're storing? How well compartmentalised is your application? (i.e. can a compromise of one account lead to another user's data being exposed?) Do you have the time to implement all of this, or do you just want to scrypt the password and call it a day -- for now?

Either way, as I've mentioned before on this blog, I want shared-secret authentication to die.

Friday 2 November 2012

Memory-Hard Password Hashing in Java

Overview

As mentioned in my last post, password hashing is about trying to use your resources to out-wit an attacker. Attackers have lots of CPU power to throw at the problem of guessing our user's passwords, and even have statistical methods for determining a user's most likely passwords.

What we need to do is slow the rate at which the attacker can make guesses. Sadly, this isn't that feasible: as CPU power only ever increases. We need to increase the cost of guessing a password. For that to happen, we need to make any hardware needed to guess a password as complex as possible to impede attackers implementing our algorithm in hardware (e.g. FPGAs) or farming it out to GPUs.

Any source code in this article is instructional only, use the real scrypt in production!

Increasing Complexity

In order to increase the complexity of the hardware that the user needs to attack the system, we must not rely too heavily on the CPU, or the GPU, or any other single piece of hardware -- we must use them in concert. For general purpose computing machinery, this means putting the server's memory to use as well.

But, you say, what about in 5 years time, when GPUs have 10GiB of onboard RAM, and FPGAs have a logic cell count in the billions? As in the previous post, we need a way to increase the complexity of the password function as time goes on.

The heart of the scrypt algorithm is referred to as "SMix", which is actually an instance of ROMix (also defined in Percival's paper) with some specified parameters.

The Heart of The Matter

ROMix is quite a simple algorithm, once you get it.

You fill an array with the repeated hash of the password, then you hash the password xor'd with various elements of that array.

In Java, this looks something like this:

public class ROMix {
  
  private MessageDigest hash;

  public ROMix(MessageDigest hash) {
    assert hash != null : "ROMix cannot work with no hash algorithm."
    this.hash = hash;
  } 

  public byte[] mix(byte[] input, Integer strength) {
    int n = Math.ceil(Math.pow(2,Double.valueOf(strength)));
    ArrayList byteArr = new ArrayList(n); // An array of size n.
    byte[] digested = hash.digest(input);
    
    // First, we iterate the hash, saving the result into an array
    for (i=0; i < byteArr.size(); i++) {
      byte[i] = digested;
      digested = hash.digest(digested);
    }

    // Next, we access the array of hashes in an psuedo-random (but predictable) fashion.
    byte[] previous = byteArr.get(byteArr.size());
    byte[] integerdPrevious = byteArr.get(integerify(previous) % n);

    for (i=0; i < byteArr.size(); i++) {
      digested = hash.digest(xor(previous, integerdPrevious));

      // Here is where we access the array of hashes in a random way,

      // thus forcing either expensive recomputation, or storing the

      // hashes in RAM (Also expensive)
      integerdPrevious = byteArr.get(integerify(previous) % n);
      previous = digested;
    }
    return digested;
  }

  private byte[] xor(byte[] a, byte[] b) {
    assert a.length == b.length : "Cannot xor two byte arrays of dissimilar length.";
    byte[] res = new byte[a.length];
    for (i = 0; i < a.length; i++) {
      res[i] = a[i] ^ b[i];
    }
    return res[i];
  }

/**

   * This method is designed to take a byte[] object (e.g. a hash) and

   * return an integer based on the contents of that byte[].

   * It does not need to represent the contents of the byte[] exactly in

   * one int.

   * @param a The byte[] that you wish to turn into an int.

   * @returns A fuzzy integer-ised version of the byte[]

   */
  private int integerify(byte[] a) {

    assert a.length > 4 : "Cannot integerify something with less than four bytes."
    int res = 0;

    // Add the last 4 bytes up.
    for(i=0; i < 4; i++) {
      res += (int)(a[a.length-i]<<(2*i));
    }
    return res;
  }
 }

Now, this is a very complex class, and frankly, the mathematical expression of it is far simpler, just two basic recurrence rules (and I do assume that xor and integerify are well-defined in my notes, rather than providing a definition).

This class guarantees that all 2ⁿ iterations of the hash are needed (Since the memory intensive second loop requires the final hash to start), and that it is difficult to predict which of the hashes from the first loop will be required in the second loop. Therefore, you have to store them all, and then perform 2ⁿ difficult-to-predict memory accesses. These memory accesses should be uniformly distributed about the first array.

It is these properties which make this function so expensive to compute: You need to be able to perform hashing operations fast, you need to be able to store the results of those computations and access them quickly.

Even if someone does manage to implement this in for an FPGA, then all you (the defender who has not yet lost all the data) has to do is increase the parameters to make it infeasible for that particular implementation to attack your hashes. You should be increasing the strength parameter regularly to ensure that cutting edge hardware cannot guess your passwords at a reasonable rate.

Still Alive

This method does not render you (or your users) completely immune to brute force attacks. If an end user picks a 5 character password, then you don't need many guesses (especially with the tools listed above) to work it out.

With a 5 character (random) password, the worst case brute force attack on a well configured scrypt will take 25 years (assuming constant cost of guessing, this assumption is false in reality). In reality, the attacker will be able to guess passwords with shocking efficiency for human-generated secrets, as well as parallelizing an attack. If the attacker can get 1000 instances of scrypt running, then the bruteforce attack takes only a week. 1000 instances is a small botnet these days.

These methods (e.g. bcrypt, scrypt, etc.) still do not replace a good password policy. Sadly, the better your policy, the more difficult it is to get users through sign up, and the number of times your password reset tool would be used would become massive.

A better method is to have a relaxed baseline: 9 characters. Then a good strength meter which only reads strong when the password is truly strong (A pass phrase, that is statistically unlikely from a large corpus of text, including upper case, lower case, digits and symbols). And this still doesn't render you immune to a user generating a really good pass phrase once and using it everywhere, including low-security sites. Low security sites which then suffer a breach and spill their user's credentials everywhere.

Where We're Heading

Well, where we should be heading, in my opinion.

I think that users should provide the sites they wish to authenticate against with a public key, and store the private key on a smart card or a modified YubiKey (See the clarification for a suggestion of that modification). If it's a smart card, the key should be encrypted using a good PIN.

Sadly, this is unlikely to happen. The YubiKey idea is most implementable. The smart card could provide something similar.

But they don't protect against a man-in-the-middle. Both devices will provide an authentication token even if the request is not valid. A demand for authentication could be (theoretically) passed to a smart card, along with a nonce, timestamp and digital signature. If the signature is valid (as determined by an internal certificate, perhaps), then it may give out the authentication token, otherwise it is withheld.

Until then, as users of shared-secret based authentication systems, ensure that you use a good pass phrase on every site you visit, and ensure that the pass phrase is unique to that site. A really good way to do that is to use something vastly similar to LastPass or KeePass, which will generate a good, unique password for every site you visit, and store it encrypted on your machine.