Monday 16 February 2015

Keeping a Web Service Secure

This post is aimed at those tasked with developing and maintaining a secure system, but who have not had to do so previously. It is primarily a sign-posting exercise, pointing initiates in the right direction. It is not gospel; many things require common sense, and there is rarely a right answer. Everything is a trade off; performance for security, usability for development time, backup size for recovery ability. To do this effectively, you need to consider your systems as a whole, including the people running and building them.

Keeping a web service adequately secure is quite a task. Many of the things covered will simply not apply, depending on your business model. For instance, web services which use services such as Google App Engine, Heroku, or OpenShift will not need to keep up with OS patches. Those that are for intranet use only may be able to get away with weaker password policies if they're not usually exposed to the internet.

Use your common sense, but be prepared to backup your decisions with evidence. When the executioner comes, you need to be able to honestly argue that you did the right thing, given the circumstances. If you can't do this honestly, you will get found out, your failure will be compounded, and the outcomes all the more severe.

The whole aim of these recommendations is to give you a head start removing potential avenues of attack for your adversaries, providing some mitigation to recover from an attack, and giving you plenty of places to go do further research.

You'll need to do the basics for securing your infrastructure. As with most other things, this is an ongoing process.

You will likely need, as a bare minimum, a properly configured firewall. I'm a big fan of pf, but IPTables is probably more widely used. Use what's appropriate for your platform, do some reading, and make sure you've got it dropping as much un-necessary traffic as possible, in and out of your organisation.

For remote access, most services go with SSH. OpenSSH is easily hardened. I'd recommend having a good read of Lucas' SSH Mastery to familiarise yourself with the finer points of the service. If your platform doesn't have SSH, it's likely that you'll have something similar, but you'll have to do your research.

If your company has more than a small handful of employees, sudo is an absolute life saver. Once again, Lucas provides a good book on utilising sudo for access control & auditing in Sudo Mastery.

You must keep any services you run up to date. This is everything, database, web server, remote access, kernel, etc. This will entail some downtime. A little and often goes a long way towards preventing major outages.

Often, people have perfectly good intentions, but fail to keep up with patches. There are three main causes.

The first is wanting to avoid any and all scheduled downtime. If you plan correctly, and notify your users ahead of time, this is no issue. People largely don't mind the service disappearing at a well-known time for an hour a month.

The second is harder to combat. Ignorance of the updates existence, the security implications of not applying the patch, or knowledge of how to apply them. You need to collate a list of your dependencies (web servers, databases, OS patches, etc.) and their security announcement pages. This list then needs to be checked, and any relevant issues assessed. You should also be aware of how you actually update each piece of software. Many operating systems from the unix family make this easy with a package manager, but I don't know how the rest of the computing world fares in that arena.

Typically, Mitre will provide an assessment of the risk posed to organisations by a vulnerability, which is normally a reasonable estimate of the risk posed to your organisation. High risk vulnerabilities may need re-mediation outside your normal downtime cycle, lower risk ones may be able to wait.

The third is that many people fear updates as they can break things. This risk only gets worse with time. If you're not keeping up with the latest security patches, then when things break, you need to work out which of the 17 patches you just applied did it. With just one or two patches, you can file a bug report, or determine if you're mis-using the dependency much more easily.

A hidden set of dependencies are those of your own bespoke software, which are pulled in by some form of build process. Many organisations simply ignore is the dependencies of their bespoke software. For instance, with a large Java application, it's all too easy to get left behind when ORMs and IoCs get updated regularly. You must be regularly checking your dependencies of your bespoke software for updates, and the same arguments regarding OS level patching apply. Get too far behind, and you'll get bitten -- hard.

I'd recommend turning on OS-level exploit mitigation techniques provided by your OS. This is typically things like address-space layout randomisation (ASLR) and W^X, but plenty of others exist. Should you be on holiday when a problem arises, these will buy you some time to either put a phone call in to someone to get the ball rolling, or get to an internet cafe and open a connection with a one-time password (don't trust the internet cafe!). They also tend to frustrate attackers, and may prevent some specific vulnerabilities from being exploitable at all.

This is a huge field, and some systems don't have all of these protections turned on by default, or simply lack the feature at all. Look very closely at your OS, and the dials that you can turn up.

Other systems of policy enforcement, such as SELinux may also help, but I've not had much chance to use them. Read up on them, both pros and cons, and work out if they're of much use to you, and if they're worth your time.

The next class of problems is running obviously bad software. Even if you keep your dependencies up to date, lots of people will fall prey to this. One of the worst offenders is WordPess (and it's plugin ecosystem), but other system have not had a clean run.

Check out the history of a project before utilising it. If you've decided that it's bad, but nothing else works for you, segregate, segregate, segregate. Then monitor it for when it does get compromised. This can be done with a simple chroot, or for beefier solutions, a separate system with an aggressive firewall between it and any user data can help. For monitoring, you'll probably want something like Nagios, but there's plenty of alternatives.

If possible, you should try to segregate and monitor your bespoke software as if it were in the class of services with a poor history, but this may not be possible.

On the subject of monitoring, you should be monitoring lots of performance metrics, such as database load, queries per second, errors/second in logs, etc. These will help tell if you something funny (security related or otherwise) is up, before it becomes a major problem for you.

You may also chose to deploy an intrusion detection system (IDS). Snort is widely recommended, and comes with a whole bunch of basic rules. You'll need to tune this to filter out the noise and leave the signal. This is no small task, prepare yourself for some serious documentation divin' with most IDS/

Once you're at this point, you should be have a relatively decent security posture. But there's two crucial things I've not covered, relating to recovery from an incident.

The first is backups. Make them, store the offline (at least one business has gone under from this), and test them, and your restore processes regularly. If something bad happens, you need these to be out of reach of the threat, whether it's deliberate or otherwise.

Secondly is general incident response. However, the one I'd most like to bring to the fore is a breach communications package. This is a set of templates covering several scenarios which can be used to notify customers, fend off the press, and put on your website to explain the situation. If you're a big company, and you expose customer data, the press will call. If you expose a lot of user data, the press will call. If you are compromised by a highly news-worthy adversary (e.g. ISIL), the press will call.

Do not waste your time trying to talk to journalists, and do not waste time writing a press release under pressure. You'll do a bad job of it, and things will go from bad to worse very quickly; especially if reporters think you're available for comment.

And finally, what's the point of all this, if you're not going to be developing some bad-ass web app to scratch a particular itch.

I strongly recommend that you use some variety of a secure development lifecycle, such as OpenSAMM, BSIMM, or Microsoft's SDL. Obviously, these won't solve your issues, but they should help alleviate them.

One of the most important things on each lifecycle is developer training. Without that, you're leaving a very large, obvious and attractive target for attackers.

Many of the things in a lifecycle will overlap with, and go beyond these recommendations. That's good and fine, but most of them discuss things in very abstract terms, and hopefully this post puts some of the requirements down more concretely.

Hopefully, that should give you a good place to start. The important thing is discipline. The discipline to check for & apply patches, follow a secure development lifecycle, impose some reasonable restrictions on yourself and therefore your attackers.

Developing software is very hard. Developing secure software is maddeningly hard. For this reason, there is no security silver bullet. Anyone trying to sell you a one-stop solution is full of it. It requires careful research and implementation; and it will take time. It is easiest to do this from the start, but by injecting quality control activities into an existing process, you can begin to improve an organisation's security posture, but it will be slow. In many cases, bespoke software will have to be reviewed or simply thrown away, services migrated to better-configured infrastructure, and so on. It takes time, a lot of time.

Tuesday 10 February 2015

Madness from the C

So, I've decided to take the plunge.

C is so widely used that not being quite intimately acquainted with it is a definite hinderance. I can read C comfortably for the most part, ably wielding the source of a few kernels or utilities to track down bugs and determine exactly how features are used is something that's not beyond my remit.

I might actually try to write some C. Originally I was going to be all hipster and show how to build your web 2.0 service in C using FastCGI, because the 90s are still alive and well. Also, you can do some pretty awesome things relating to jailing strategies (e.g. chroot+systrace, chroot+seccomp or jail+capsicum), and the performance can be good.

Unfortunately, when it comes to writing C, I freeze up. I know there is so much left to the imagination; so much to fear; so much lurking in the shadows, waiting to consume your first born. And I hate giving out advice that could lead to some one waking up to find ETIMEDOUT written in blood on the walls; even if the "advice" is given in jest. (Thanks to Mickens for the wording)

I speak of the dreaded undefined behaviour.

Undefined behaviour is a double edged sword. In tricky situations (such as INT_MAX + 1), it allows the compiler to do as it pleases, for the purposes of simplicity and performance. However, this often leads to "bad" things happening to those who transgress in the areas of undefined behaviour.

I suggest that if you are a practicing C developer, and you don't think this is much of a problem, or you don't know much about it, you read both Regehr's "A Guide to Undefined Behaviour in C and C++" and Lattner's "What Every C Programmer Should Know About Undefined Behaviour" in full.

I was in the camp of "undefined behaviour's bad, but not too much of a problem since it can be easily avoided" camp, since I am more security-oriented than performance-oriented. I much prefer Ada to C.

That was, until I started in on Hatton's "Safer C".

The book is well written, clear, and direct.

Broadly speaking, all was going well, until I got to page 49. Page 49 contains a listing of undefined behaviour, as do pages 50, 51, 52, 53, 54, 55, and about one third of 56. That's over 7 pages of undefined behaviour.

It could be made better, if these were all weird, corner cases, that the compiler clearly could not detect and throw out as "obviously bad," and not even just semantically bad (for example, dereferencing a null pointer); but those that are syntactically bad.

Things like "The same identifier is used more than once as a label in the same function" (Table 2.2, entry 5) and "An unmatched ' or " character is encountered on a logical source line during tokenization" (Table 2.2, entry 4) and my current favourite, "A non-empty source file does not end in a new-line character, ends in a new-line character immediately preceded by a backslash character, or ends in partial preprocessing token or comment" (Table 2.2, entry 1).

Just look at those three. Syntactic errors, which, according to the standard, could bring forth the nasal demons.

Let me put it more bluntly. A conforming compiler may accept syntactically invalid C programs, and then emit (in effect) any code it wants.

Now, clearly, most compilers do not do this; they give define these undefined behaviours as syntax errors. The thing which really scares me is that the errors presented are from the first page of the table, and there's another six pages like that to go.

Further, I think I must come from a really luxurious background. I expect compilers to do everything reasonable to help program writers avoid obvious bugs, rather than simply making the compiler writers life easier.

A compiler is to be written for a finite set of hardware platforms. It needs to be used by countless other (probably less experienced) programmers to produce software which may then go on to be used by an unfathomable number of people. It's no small thing to claim that a large fraction of the world's population have probably interacted, in one way or another, with the OpenSSL code base, or the Linux kernel.

The reliability of these programs is directly correlated to the "helpfulness" of the language they are written in, and as such, C needs to be revisited with a view to "pinning down" many of the undefined behaviours; especially those which commonly lead to serious safety or security vulnerabilities.

Wednesday 4 February 2015

Last Christmas, I gave you my HeartBleed

With HeartBleed well and truly behind us, and entered into the history books, I want to tackle the idea that HeartBleed would've definitely been prevented with the application of formal methods -- specifically those that culminate in a proof of correctness regarding the system.

Despite sounding really good, this is actually a false statement, and the reasoning isn't actually all that subtle.

To demonstrate, I'm going to use a common example, the Bell and LaPaulda model for mandatory access control.

The model is, roughly, that everything is a resource or a subject. Both resources and subjects have classifications (we'll just limit ourselves to two), for example, "low" and "high". Subjects with low classification can only read low resources. Subjects with high classification may read both high and low resources.

The specification of the write operations is not currently relevant.

So, let's try and specify this in (something like) the Z notation; we need two operations, ReadOk and ReadError. ReadOp is a combination of both of them.

I apologise in advance, this is the bastardisation of Z that I could work into a blog post without inclining images.

ReadOk:

Variables:
SubjectClassification?
ResourceId?
Result!

Predicates:
(SubjectClassification? = high) || (classificationOfResource(resourceFromId(resourceId?)) = low)
Result! = resourceFromId(resourceId?)

What this can be read as is, given inputs SubjectClassification and ResourceId, and an Result output; and the predicate (SubjectClassification? = high) || (classificationOfResource(resourceFromId(resourceId?)) = low), then the Result output is the resource.

This embodies the "Read down" property of Bell & LaPaulda. If a subject has high classification, then they can read, otherwise, the resource must have a low classification.

The ReadError operation is similarly designed;

ReadError:

Variables:
SubjectClassification?
ResourceId?
Result!

Predicates:
(SubjectClassification? = low) && (classificationOfResource(resourceFromId(resourceId?)) = high)
Result! = error

This basically, roughly, very poorly, states that if you're a low classification subject, and you try to read a high classification resource, you'll get an error.

We now know that, for our specification, we're done.

This looks really good! A decent specification that we can really work to! Let's implement it!

public class BLPFileSystem {

  public enum Classification { LOW, HIGH }

  private final Map<ResourceId, Resource> resourceMap;

  public Resource readById(final ResourceId id, Classification subjectClassification) {

    assert id != null;

    assert resourceMap.contains(id);



    Resource r = resourceMap.get(id);

    Classification resClass = r.getClassification();

    r.setClassification(LOW);



    if (subjectClassification == HIGH) {

      return r;

    } else if (resClass == LOW) {

      return r;

    } else {

      throw new IllegalAccessException();

    }

  }

}
 
Also, imagine the rest of the infrastructure is available and correct...

Well, this is obviously broken. If you've skipped the code, the offending line is "r.setClassification(LOW);"

That's right, this method declassifies everything as it goes along!

Interestingly, this completely meets our specification. Now, if this was an automatically verified (engage hand-waving/voodoo), I could push this to production with no hassle.

This isn't just a contrived example, but a demonstration of a general issue with these sorts of things.

A specification is usually a minimum of what your software must do -- it usually does not declare a maximum. In OpenSSL's case, the software did the minimum that it was supposed to, it also went above and beyond it's specification to work in new and interesting ways; which turned out to be really bad.

Even with our file system, we can add predicates to ensure that the state of the read file is not changed by reading it; but then the state of other files could be modified. A bad service could declassify every other file when reading any file.

It's not easy to just put "must not" into the spec. Many cryptography systems must run in adversarial situations, for example, sharing virtual machines sharing hardware with adversaries. These systems must protect their key material despite a threat model in which the adversary can measure the time, power and cache effect of the system.

In our example, the system can still violate the "spirit" of the specification by leaking information based on timing-dependant operations.

In part, this exists because the specification is a "higher level" of abstraction, and the abstraction is not perfect.

Now, that is not to say that we should abandon formal methods. Far from it, these, and related formalisms are the kinds of things which save lives, day in, day out. The overall quality of most projects would be vastly improved if some degree of formal methods had been applied from the start. Doubly so if it's something as unequivocally bad as the OpenSSL source. It's just that, in the face of security, our tools need some refinement.