Coder in a World of Code: 2014

Saturday 13 December 2014

Securing Strings

This is not about String as in Java, or std::string in C++. This is about the program strings, part of the GNU Development tools.

The strings program takes a file and prints out the printable strings.

Recently, the author Michal Zalewski (aka, icamtuf) used his American Fuzzy Lop (afl) tool to fuzz a variety of GNU tools, one of which was the strings program. The outcome of this was that it's a very bad idea to run strings on untrusted input.

I should make it clear, I don't think that they author of strings should've undertaken these when it was written. Most software starts off as a personal prototype or tool and grows. It's silly to start demanding the most rigorous secure software development methodology from one author and their pet project.

Once the pet project escapes and starts being relied on by other people, the dynamic obviously changes, more questions about who is responsible for the correctness of the program start being asked -- even if anyone is responsible for it, since most software comes with a disclaimer of warranty.

I will not cover program verification tools. They're often just overkill for most problems, and I think this may well be one of them.

Anyways, onto my main point. How do we go about solving this problem once and for all?

Audit ALL the things!

This is OpenBSD's primary approach. All commits must be reviewed, and almost all of the code base was reviewed somewhere around version 2 or 3, when Theo De Raadt's own OpenBSD machine was compromised (Shock, horror!).

This is a timely process, and there are no guarantees. If one person misses a subtle bug, what's the chance that the next person misses the bug? I'd wager that the chance is "high," but I'm mostly speculating.

I'm actually very pro-code review/audit. I like that TrueCrypt (now VeraCrypt) is getting audited, and at my work, I'm pushing hard for code review before any code goes live. I'm also aware that it is by no means a perfect tool, and it does slow down the process to go from design to deployment.

Use a Memory Safe Language

We could use (for example) Java, or even Ada (with the right GNATs flags) to re-write the strings tool and completely avoid memory safety vulnerabilities.

I like this idea for new projects; I would never suggest starting a new project in C, Objective-C or C++ because they all inherit C's badness with memory.

But... Java requires a runtime (The JVM) and it's startup time is non-trivial. Most people don't know Ada, and Haskell has even fewer engineers to its name.

Further, for Java especially, you're relying on the security of the underlying runtime, which hasn't had a great track-record.

I'd argue that Ada is the best choice out of the lot, but I'm biased. I really like Ada.

Obviously, re-writing extant programs in an entirely new language is not the smartest idea, unless there's really good reason. It's time consuming, and you're likely to re-introduce bugs that you coded out in the original.

Apply Good Security Principles

What I actually mean by this is that you should ensure that you apply the principle of least privilege. That means restricting exactly what the program can do, so that if compromised, the program can't do much more harm, even if the attacker manages to gain complete control over the program.

On Linux, this can be achieved to a very fine-grained level with the seccomp system call, and on FreeBSD, there is the Capsicum subsystem.

What these allow you to do is to enter a "secure" mode, where by all but a few, specified system calls are completely disabled. An application attempting to run the banned system calls is killed by the kernel. Often, you're allowed to enter the secure mode with a file descriptor.

For strings, you'd read the command line, open your target file read only (check it exists, is readable, etc.) and then enter secure mode whereby you can only read the single opened file. Should an RCE be found, the adversary would be able to read as much of the open file that they like, but they would be contained within that single process. They could not open a new shell (that would involve a banned system call), the could not open file descriptors to sensitive files (/etc/passwd, browser caches, etc.) since that would involve creating a new file descriptor, which is banned. It couldn't open a network socket and send any data to a C&C host, as it would be banned from creating sockets.

The only way out would be to find a privilege escalation exploit in the kernel using the system calls that aren't immediately filtered.

I actually like this idea best, since it can easily be combined with code review. You aim to reduce the number of security relevant bugs using code review (and testing, but you're unlikely to cover a large amount of the state space). Any that slip through the net become safe crashes, not full compromises.

First, you implement the minimal "jail" (not like chroot jails or FreeBSD jails, but seccomp or Capsicum per-process jails), and have the implementation use it. You then get your colleagues to review the jail implementation in your program.

Saturday 6 December 2014

A Simple Mail Merge Application

My other half has just finished writing their first full-length novel. As such, they'd like to send it off to agents.

My first thought was OpenOffice Base. Have her enter the agent details into a table, and use OpenOffice Writer's mail merge facility. This however, did not work, since Writer's mail merge facility lacked the all-important attachments functionality.

If OpenOffice had this functionality, we'd have been up and running in about 15 minutes. I'm surprised it doesn't, it's the ultimate way to automate the job search, surely an unemployed programmer would've provided the functionality at some point...

But, leaving that by the way side, I wasn't about to give up on OpenOffice just yet. I know that OpenOffice Base's files are just cunningly zipped HSQL databases, with some metadata surrounding it.

So, I thought I'd unzip the OpenOffice Base file and have a small Java application read the HSQL database, put the results through the Velocity template engine and send off the email.

This would've involved sneaker-netting the ODB file back and forth between my machine and my partner's, but that seemed ok. They'd enter many agents in during the day, and I'd "send" them all over night. No biggy.

This was also a bust. Once my Java application with it's all-mighty HSQL JDBC jar had touched the database, it seemed to taint it. I think it bumped a version field in the database. This meant that OpenOffice Base refused to open it after even one round with my Java program.

So, plan C. SQLite is an amazing embedded database. Far faster and nicer than HSQL -- it even comes with a neat little command line interface.

I set up some test data in an SQLite database and pointed my Java program at it. Success!

So then I told OpenOffice Base to look at the file, so my partner could enter some data. Failure! OpenOffice Base had an issue with (I think) the scrolling modes available. So that was right out, wouldn't even show the data in the OpenOffice interface. Sad times.

Plan D. Remember, you've always got to have at least 3 fall-back plans when developing software, otherwise nothing will ever work.

PostgreSQL to the rescue! I setup Postgres on my machine and opened a port for it. On my partner's machine, I tested that they could connect their OpenOffice Base to my Postgres. I then tested dropping some test data in the Postgres database and trying my Java program, configured to use Postgres... Success!!

Now all I need to do is figure out how to send MIME multi-part HTML emails correctly ... ugh.

Anyway, this has been a day or so worth of work on my part. I suspect I'll have another half-day to get HTML emails working correctly, and then I'll be sorted. Hopefully it'll enable my partner to effectively reach a whole host of agents without writing out the same damn cover letter, attaching various PDFs, and DOC files, etc. over and over.

Once this is all wrapped up, I may open source it. I'll need to tidy the code, add tests and documentation, but it may be of use to someone.

The moral of the story is that HSQL is a difficult database to work with, SQLite is always awesome but somethings don't support it, and PostgreSQL is the best RDBMS since sliced bread.

Sunday 28 September 2014

How I fixed Shellshock on my OpenBSD Box

Well, I am on a "really old" OpenBSD, and I couldn't be bothered updating it right now. It was really arduous:

[Sun 14/09/28 10:52 BST][p0][x86_64/openbsd5.2/5.2][4.3.17]
zsh 1044 % sudo pkg_delete bash
bash-4.2.36: ok
Read shared items: ok

In reality, this is only possible, because, as a sane operating system, there's no dependencies on anything other than a POSIX-compliant sh, of which there are several available.

To me, just another reason to avoid specific shells and just target POSIX. When the shit hits the fan, you'll have somewhere to hide.

When I tried to do the same on my work (FreeBSD) box, I came up against several issues. The main one being that lots of packages specify bash as a dependency.

At some point, I'll write a blog post about the functions of that server, and how I've hardened it.

Friday 26 September 2014

Time for Regulation in the Software Industry?

Many pieces of software have a clause similar to:

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

You may recognise that as a large portion of the BSD 2-Clause open source license. Not to pick on BSD style licenses, section 15 of the GPL v3, section 11 of the GPL v2, section 7 of the Apache license, and a substantial portion of the NCSA and MIT licenses also have this clause. The wording is very similar from license to license. Let's look at how this plays out:

Linux: GPL
FreeBSD: BSD Style
OpenBSD: BSD Style
Bash: GPL
GCC: GPL
Clang/LLVM: NCSA
Apache HTTPd: Apache
NGinx: BSD
Postgres: "BSD-Like", Postgres License
MySQL: GPL
Apache Struts: Apache

This represents a huge portion of internet-facing devices, including popular services such as Netflix, tumblr, Facebook, and even the London Stock Exchange. Many devices run Android, and lots of the routers separating home networks from the Internet are running variants of Linux.

This is not to exclude commercial software, even Microsoft puts these clauses in their licenses.

I've even come across behaviour in The Co-operative Bank's online banking platform and TSB's Verified by Visa platform (URL structure, exception handling behaviour) that suggest it has a component which uses Apache Struts.

The basic meaning (and I am not a lawyer by any stretch) is that the person or entity who produced the software is (as far as legally possible), in the event of Bad Stuff, they're not Responsible for the Bad Stuff.

So, the story so far:

Developers kindly write software, entirely disclaim warranty or liability.
Organisations setup paid-for services off the back of this software, without verifying that the software is fit for purpose, since auditing software is expensive.
End users then entrust the organisations, via the un-audited software, with their money and personally identifying information (PII).

The end users -- the ones risking their bank deposits, their PII -- are, in many cases (banking has specific protections), the ones who are basically expected to
evaluate the software which they are unknowingly going to use. They are in no position to asses or even understand the risks that they are taking.

With cars, there are safety ratings, Euro NCAP and the like. Electric goods must meet a minimum safety standard (In the UK, most people look for the Kitemark) before they can be brought to market. When these goods fail to meet this standard, the seller is held accountable, who in turn, may hold suppliers accountable for failing to supply components which meet the specified requirements. There is a well-known, well-exercised tradition of consumer protection, especially in the UK.

On the other hand, should you provide a service which fails to deploy even basic security practices, you're likely to get, at most, a slap on the wrist from a toothless regulatory body (In the UK, this is the Information Commissioner's Office, ICO). For example, when Tesco failed to store passwords securely, the only outcome was that the ICO was "unimpressed". There was likely no measurable outcome for Tesco.

Banks, who usually are expected to be exemplary in this field, respond very poorly to research that would allow consumers to differentiate between the "secure banks" and the "insecure banks". This is the exact opposite of what needs to happen.

The lack of regulation, and strangling off of information to consumers is leading to companies transferring ever larger risks to their clients. Often, the clients have no option but to accept this risk. How many people have a Facebook account because it's the only way to arrange social gatherings (because everyone's on Facebook)? How many people carry and use EMV (in the UK, these are Chip n' PIN) debit or credit cards? Banks are blindly rolling out NFC (Touchless payments) to customers, who have no choice in the matter, and who, in many situations, simply do not know that this is an unsafe proposition, and could never really know.

An illuminating example is eBay's recent antics. The short version of the story is that a feature for sellers (and hence, a profit making feature for eBay) has turned out to be a Bad Idea, exposing buyers to a very well crafted credential-stealing attack. This has lead to the exposure of many buyer's bank details to malicious third-parties who have then used the details to commit fraud.

In this situation, eBay is clearly gaining by providing a feature to the sellers, and by shifting the risk to the consumer. Further, because this is a profit-making hole, and closing it could break many sellers' pages (thus incurring a huge direct cost), eBay is spinning its wheels and doing nothing.

The consumers gain little, if anything from this additional risk which they are taking on. Like a petrochemical company building plants in heavily populated areas, the local population bear the majority of the risk and do not share in the profits.

This is an unethical situation, and regulatory bodies will normally step in to ensure that the most vulnerable are suitably protected. This is not the case in software & IT services, the regulatory bodies are toothless, and often do not have the expertise to determine which products are safe and which are not.

For instance, both OpenSSL and NSS (both open source cryptographic libraries) are used the The Onion Router (TOR) and the TorBrowser (Effectively FireFox with TOR baked in) are used by dissidents and whistleblowers the world over to protect their identity where their lives or livelihoods may be at risk.

Both OpenSSL and NSS have won FIPS-140 (a federal standard in the US) approval in the past. Yet we have had Heartbleed from OpenSSL, and recently signature forgeries in NSS. Clearly, these bodies don't actually audit the code they certify, and when it does go catastrophically wrong, the libraries in question maintain their certifications.

For reference, the academic community have been concerned with the state of the OpenSSL codebase for some time. We've known that it was bad, we've shouted about it, and yet it retained it's certified status.

Individual governments often influence this, by only procuring high-assurance software, and demanding that certain products meet stringent standards, and failures of those products can therefore be financially damaging to the suppliers.

The UK government already has Technology code of practice which government departments must use to evaluate IT suppliers' offerings. However, there are many more fields which the government has little remit over, and no international remit. The US Government has similar standards processes embodied with the Federal Information Processing Standards (FIPS, of which FIPS-140, mentioned above, is just one of many).

We also have existing standards processes, like the ISO 27000 series, which have a viral nature, in that the usage of a external service can only be considered if the organisation aiming for ISO 27000 certification can show that it has done due diligence on the supplier.

However, as mentioned above, these standards rarely mean anything, as they rely on the evaluation of products which very few people understand, and are hard to test. Products that shouldn't achieve certification do, as with OpenSSL.

Clearly, the current certification process is not deployed widely enough, and is not behaving as a certification process should, so we need something with more teeth. In the UK, we have the British Medical Association (BMA), which often takes it's recommendations directly from the National Institute of Clinical Excellence (NICE). If a failure occurs, a health care provider (doctor, medical device supplier, etc.) will end up in court with the BMA present, and may lose their right to trade, as well as more serious consequences.

There is a similar situation in the UK for car manufacture, where cars have a minimum safety expectation, and if the manufacturer's product doesn't meet that expectation, the company is held accountable.

Another example is the US cars being sold into China. Many US cars don't meet the Chinese emissions standards, and hence cannot be sold into China.

We need something similar in software and services: an agreement that, like in other forms of international trade, vendors and service providers are beholden to local law.

We have existing legislation relating to the international provision of services. In many cases, this means that when a company (such as Google) violates EU anti-competition laws, they are threatened with fines. The laws are in place, but need to be stronger, in terms of what constitutes a violation of the law, and the measures that can be applied to the companies in question.

Currently, internet is the wild west, where you can be shot, mugged and assaulted all at once, and it's your own fault for not carrying a gun and wearing body armour. However, the general public are simply not equipped to acquire suitable body armour or fire a gun, so we need some form of "police" force to protect the general public.

There will always be "bad guys" but we need reasonable protection from both the malicious and the incompetent. Anyone can set up an "encrypted chat program" or "secure social media platform", but actually delivering on those promises when people's real, live PII is on them is much harder, and should be regulated.

Acknowledgements

Many thanks to my other half, Persephone Hallow for listening to my ranting on the subject, and inspiring or outright suggesting about half the ideas in this post, as well proofreading & reviewing the post.

Sunday 14 September 2014

Achieving Low Defect Rates

Overview

Software defects, from null dereferences to array out of bounds and concurrency errors are a serious issue, especially in security-critical software. As such minimising defects is often a stated goal of many projects.

I am currently writing a library (OTP-Java) which provides several one-time password systems to Java applications, and this is a brief run-down of some of the steps I am taking to try to ensure that the library is as defect-free as possible. The library is not even alpha as yet. It still has several failing tests, but hopefully will be "completed" relatively soon.

Specification

Many products go forward without a specification. In many cases this is not a "bad thing" per-se, but it can make testing more difficult.

When surprising or unexpected behaviour is found, it should be classified as either a defect or simply a user without a complete understanding of the specification. With no specification, there can be so such classification. The best that can be done is to assess the behaviour and to determine if it is "wanted".

As an example, I have seen a system where one part expected a user to have access to some data, and forwarded them on to it. The system for retrieving the data had more stringent requirements, and threw a security exception. Without a clear, unambiguous specification, there's no way of telling which part of the system is in error, and hence, no immediate way to tell which part of the system should be corrected.

I would like to make it clear that I am not advocating for every system to have a large, unambiguous specification. If the product is security or safety critical, I would argue that it is an absolute must, and many share this view. For most other systems, a specification is an additional burden that prevent a product from getting to market. If a failure of your system will sink your business or kill people, then a specification is wise. Otherwise, just listen to your users.

Defects

Given a specification, a defect is often defined simply as a deviation from that specification. Many defects are benign, and will not cause any issues in production. However, some subset of defects will lead to failures -- these are actual problems encountered by users: exception messages, lost data, incorrect data, data integrity violations and so on.

It is often seen to be most cost-effective to find and eliminate defects before deployment. Sometimes, especially in systems that do not have an unambiguous specification, this is extremely difficult to do, and in a sufficiently large system, this is often nearly impossible.

For a large enough system, it's likely that the system will interact with itself, causing emergent behaviour in the end product. These odd interactions are what make the product versatile, but also what make eliminating surprising behaviour nearly impossible, and it may even be undesired for certain products.

Static Analysis

Tools that can provide feedback on the system without running it are often invaluable. Safety critical systems are expected to go through a battery of these tools, and to have no warnings or errors.

I am using FindBugs on my OTP-Java project to try to eliminate any performance or security issues. I have found that it provides valuable feedback on my code, pointing out some potential issues.

There are also tools which will rate the cyclomatic complexity (CC) of any methods that I write. I believe that Cobertura will do this for me. This will be important, as a high CC is correlated with a high defect rate, and is also expected to make reading the code more difficult.

Testing

Randomised Testing

Fortunately, generating and verifying one-time passwords (OTPs) is a problem space where there are clear measures of success, for example, if I generate a valid OTP, I must be able to validate it. Similarly, if I generate an OTP, modify it, the result should not be valid.

This lends itself to randomised testing, where random "secrets" can be produced, and used to generate OTPs. These can then be verified or modified at will.

Other properties can also be validated, such as, requesting a 6-digit OATH OTP actually does produce a 6-digit string, and that the output is entirely composed of digits.

For the OTP-Java project, I am using the Java implementation of QuickCheck, driven by JUnit.

Unit Testing

In OTP-Java, I've augmented the randomised testing with some test vectors extracted from the relevant RFCs. These test vectors, along with the randomised tests, should provide further confidence that the code meets the specification.

Usually, unit testing only involves testing a single component or class. However, with such a small library, and with such well-defined behaviour, it makes sense to test the behaviour of several parts of the system at the same time.

In my opinion, these tests are probably too big to be called unit tests, and too small to be called integration tests, so I've just lumped them together under "unit tests". Please don't quarrel over the naming. If there's an actual name for this style of testing, I'd like to know it.

Assertive Programming

I have tried to ensure that, as far as possible, the code's invariants are annotated using assertions.

That way, when an invariant is violated under testing, the programmer (me!) is notified as soon as possible. This should help with debugging, and will hopefully avoid any doubt when a test fails as to whether it was a fluke (hardware, JVM failure, or other) or genuine failure on my part.

This has, so far, been of a lot of use in randomised testing, where there have been test failures, but determining exactly why has been shown by the assertions.

It also helps users of the library. If my input validation is not good enough, and the users subject the library to tests as part of their testing, they will also, hopefully, be helped by the assertions, as they may help explain the intent of the code.

Type System

While Java's type system does leave a lot to be desired, it is quite effective at communicating exactly what is required and may be returned from specific methods.

I have, unlike the reference implementation in the RFC (Mmm, "String key", "String crypto", and so on), tried to use appropriate types in my code, requiring a SecureRandom instead of just byte[] or even Random, to convey the fact that this is a security-critical piece of code, and one shouldn't use "any old value", as has often happened with OpenSSL's API (See also, predictable IVs) which have lead to real vulnerabilities in real products.

Shielding the user from common mistakes by the use of a "sane" or "obvious" API is as much my job as it is the final user's. The security of any product which relies on the library is formed by the library's correct specification and implementation, as well as it's correct use. Encouraging and supporting both is very important.

Code Coverage

Code coverage is often a yard-stick for how effective your testing is. Code coverage of 100% is rarely possibly. For instance, if you use Mac.getInstance("HmacSHA1"), it's nearly impossible to trigger the "NoSuchAlgorithmException".

However, many tools provide branch-coverage as well as line coverage. Achieving a high coverage can help your confidence, but when using complex decision cases (for example, if (a && b || !(c && (d || e)))), it's very hard to really be sure that you've covered all cases for "entry" into a block

Cyclomatic complexity (CC) should help here. As a rough guide, if you have a CC of 2, you should have at least 2 tests. Although, this is still just a rule of thumb, it does help me feel more confident that I've ensured that, to a reasonable level, all eventualities are accounted for.

Conclusion

Many products don't have a specification, which can make reducing surprising behaviours difficult. Similarly, not all defects lead to failures.

However, even without a specification, some of the techniques listed in this post can be applied to try to lower defect rates. I've personally found that these increase my confidence when developing, but maybe that just increases my appetite for risk.

I am by no means saying that all of the above tools and techniques must be used. Similarly, I will also not say that the above techniques will ensure that your code is defect free. All you can do is try to ensure that your defect rate is lowered. For me, I feel that the techniques and tools in this post help me to achieve that goal.

Friday 5 September 2014

The Future of TLS

There exist many implementations of TLS. Unfortunately, alongside the proliferation of implementations, there is a proliferation of serious defects. These defects aren't just limited to the design of the protocol, they are endemic. The cipher choices, the ways in which the primitives are combined, everything.

I want to lay out a roadmap to a situation where we can have an implementation of TLS that we can be confident in. Given the amount commerce that is currently undertaken using TLS, and the fact that other, life-and-death (for some) projects pull TLS libraries (don't roll your own!) and entrust them, I'd say that this is a noble goal.

Let's imagine what would make for a "strong" TLS 1.3 & appropriate implementation.

We start with primitive cipher and mac selection. We must ensure that well-studied ciphers are chosen. No export ciphers, no ciphers from The Past (3DES), and the ciphers must not show any major weaknesses (RC4). Key sizes must be at least 128-bit or more. If we're looking to defend against quantum computers, the key-size must be at least 256-bit. A similar line of reasoning should be applied to the asymmetric and authentication primitives.

Further, the specification should aim to be as small, simple, self-contained and obvious as possible to facilitate review. It should also shy away from recommendations that are known to be difficult for programmers to use without a high probability of mistakes, for instance, CBC mode.

Basic protocol selections should ensure that a known-good combination of primitives is used. For example, encrypt-then-MAC should be chosen. We head off any chosen ciphertext attacks by rejecting any forged or modified ciphertexts before even attempting to decrypt the cipher text. This should be a guiding principle. Avoid doing anything with data that is not authenticated.

The mandatory cipher suite in TLS 1.2 is TLS_RSA_WITH_AES_128_CBC_SHA. While this can be a strong cipher, it's not guaranteed. It's quite easy to balls-up AES CBC, to introduce padding oracles, and the like. I would argue that the default should be based around AES GCM, as this provides authenticated encryption without even so much as lifting a finger. Even the choice of MAC in the current default is looking wobbly. It's by no means gone, but people are migrating away from HMAC-SHA1 to better MAC constructions. I would recommend exploiting the parallelism on current-generation technologies by allowing a PMAC.

There should also be allowances, and maybe even a preference towards well-studied ciphers that are designed to avoid side-channels, such as Salsa20 and ChaCha20.

There should be no equivalent of the current "null" ciphers. What a terrible idea.

I like that in the current RFC, the maximum datagram size is specified. That means that the memory usage per-connection can be bounded, and the potential impact of any denial of service attack better understood before deployment.

For the implementation, I am not fussy. Upgrading an existing implementation is completely fine by me. However, it should be formally verified. SPARK Ada may be a good choice here, but I have nothing against ACSL and C. The latter "could" be applied to many existing projects, and so may be more applicable. There are also more C programmers than there are Ada programmers.

Personally, I think SPARK Ada would be a fantastic choice, but the license scares me. Unfortunately, ACSL has it's own host of problems. Primarily that the proof annotations for ACSL are not "as good" as for SPARK, due to the much more relaxed language semantics in C.

Any implementation which is formally verified should be kept as short and readable as reasonably possible, to facilitate formal review. The task of any reviewers would be to determine if any side-channels existed, and if so, what remediation actions could be taken. Further, the reviewers should generally be checking that the right proofs have been proven. That is, the specification of the program is in-line with the "new" TLS specification.

Responsibilities should be shirked where it makes good sense. For instance, randomness shouldn't be "hoped for" by simply using the PID, or the server boot time. Use the OS provided randomness, and if performance is paramount, use a CSPRNG periodically reseeded from true random (on Linux, this is supposed to be provided by /dev/random).

The implementation should, as far as possible, aim to meet the specification, as well as providing hard bounds for the time and memory usage of each operation. This should help with avoiding denial of service attacks.

The deployment would be toughest. One could suppose that a few extra cipher suites and a TLS 1.2 mode of operation could be added in to ease deployment, but this would seriously burden any reviewers and implementors. The existing libraries could implement the new TLS specification, without too much hassle, or even go through their existing code-bases and try to move to a formally-verified model, then add in the new version. Once a sufficient number of servers started accepting the new TLS, a movement in earnest to a formally verified implementation could begin (unless this route was taken from the start by and existing library).

Any suitable implementation that wants major uptake would ensure that it has a suitable license and gets itself out there. Current projects could advertise their use of formal methods to ensure the safety of their library over others to try to win market share.

We had the opportunity to break backwards compatibility once and fix many of these issues. We did not, and we've been severely bitten for it. We really go for it now, before the next heartbleed. Then we can go back to the fun stuff, rather than forever looking over our shoulders for the next OpenSSL CVE.

Tuesday 2 September 2014

Lovelocks: The Gap Between Intimacy and Security

Let me start by asking a question. One that has been thoroughly explored by others, but may not have occurred to you:

Why do you lock your doors when you leave your house?

We know that almost all locks can be defeated, either surrupticiously (for example, picking or bumping the lock) or destructively (for example, by snapping the lock). What does this say about your choice to lock the door?

Many agree with me when I say that a lock on a private residence is a social contract. For those that are painfully aware of how weak they are, they represent a firm but polite "Keep Out" notice. They are there to mark your private space.

Now, assume that you and your partner have a long distance from relationship with one another, and also live in the 50s. Love letters may have been common in this era, and you would have likely kept the letters in your house. I know that I did just this with previous partners. I kept the love letters in my house.

Imagine that someone breaks into your house and photographs or photocopies those letters, and posts the contents of them in an international newspaper. To learn of the intrusion, and that you have been attacked on such a fundamental level would be devastating.

To my mind, that is a reasonable analogy for what has happened to those who have had their intimate photos taken and posted to 4chan. An attacker defeated clear security mechanisms to gain access to information that was obviously, deeply, private and personal to the victims.

These victims were not fools, they did the equivalent of locking their doors. Maybe they didn't buy bump, snap, drill and pick resistant locks for over £100 per entrance, but they still marked the door as an entrance to a private space and as such marked it as their private space.

It is right that the police should be treating this breach as an extremely serious assault on these womens' (and they are all women at this time) personal lives, and therefore pursuing the perpetrators to the fullest extent allowable under law.

Claiming that the victims should have practiced better security in this case is blaming the victim. If I go to a bank, and entrust my will to one of their safety deposit boxes (now a dying service), and it is stolen or altered in my absence, am I at fault? No, the bank should be investing in better locks -- that's what they're paid to do; and beyond that, people shouldn't be robbing banks. Bank robbers are not my fault, and iCloud hackers are not the fault of these women.

Further, it is just plain daft to claim that these women would be able to protect themselves in this world, and maintain the intimate relationships that they wanted to. Security is very hard, we know this as barely a week passes where members of a service are not urged to change their passwords due to a security breach. And these breaches affect entities of all sizes, from major corporations to single people. Even those agencies with budgets the size of a small country's GDP, and whose remit is to protect the lives of many people have serious breaches, as evidenced by the ongoing Snowden leaks.

Expecting anyone to be cognizant of the security implications and necessary precautions of sharing an intimate moment with a partner is to ask them to question their trust for that person and the openness of that moment. Security is the quintessential closing off of oneself from the world, of limiting trust, and exerting control. Intimacy; even long distance; is about opening oneself up to another person, to sharing yourself with them, extending to them a deep trust, and allowing them a spectacular amount of control over yourself, physically and emotionally. To place these two worlds together is cognitive dissonance made manifest.

And yet, we use these flawed security devices to display and proclaim our love -- and hence intimacy -- with one another. Even as a we put love locks on bridges, we know that they can be removed. We acknowledge their physical limitations as we try to communicate our emotions with each other.

Pont des Arts + canenas" by Inocybe/Piero d'Houin - Own work. Licensed under CC BY-SA 3.0 via Wikimedia Commons.

We are accepting of the persistent and somewhat insurmountable failings of physical security, and we do not blame the victim when their physical security is breached. It is also the case that physical and digital security are in many senses a good analogy of one another, but that we apply different standards. We need to start realising that digital security is also imperfect, and further, that it is not the victims' fault when that security fails.

Friday 29 August 2014

Biometric Authentication

Overview

Given that the BBC has recently published an article, promoting biometrics as the password replacement technology, I'd like to beat this dead horse as to why biometrics are such a bad idea.

Duress

Many very secure systems which require a PIN, password or passphrase (hence forth, just "secret"), often have multiple secrets.

These are for the end user, should they ever be under duress, i.e. coercion, threats, intimidation. When entered, they give the adversaries what they want, but also alerts the system that something is very wrong.

There's no such thing as a "duress iris scan."

Recovery

We have methods of recovering fingerprints from objects. Part of our forensic system is based on exactly this. They can also be duplicated.

Unfortunately for users of this system, when your adversaries do that, how exactly are you going to change your fingerprints to circumvent the issue?

Resistance

Take, for example, a DNA-based biometric. You often leave your DNA in places you go, and on objects you touch. It's often found on things you use to eat or drink, like drinks cans.

That means that someone rumaging through your bins will likely get the key to impersonating you with 99.9% chance of success, and ~0% chance of detection. Good job we had something super secure!

Discriminatory

Biometrics are mostly based on you having certain body parts, with few exceptions.

Lost your hands in an industrial accident? Sorry, you can't vote. Born without eyes? Not allowed a bank account. Mute? No collecting your pension from the post office.

Perverse Incentives

Fingerprints, retina scans and iris scans present the adversary with some pretty perverse incentives.

Imagine, you are targetted for attack. Previous, with passwords, someone would research you and send a well crafted email asking for your password (or other secret), or directing you to a website that would drop some malware on your machine. This would be, probably, the most effective route for an adversary, and is often called "spear phising".

Not nice, but nothing is physically threatening you.

Now, many adversaries will think: "We need their fingerprints, fingerprints are kept on fingers, we need their fingers!" and come visit you with some bolt cutters. How is this any better than encouraging someone to deceive you, and when you fall for it, and notice (for instance) bank funds missing, you just change your secrets?

Conclusion

Secure authentication systems don't come from verifying your identify to some ridiculously high degree of confidence. Secure systems in general accept that there will be failures, passwords forgotten, sessions left open on public terminals, etc. and having systems in place to resist and recover from these scenarios.

Biometric-based authentication systems take the verification step to it's absolute maximum, but they provide none of the other extremely important features of other authentication systems.

In short, do not use biometrics for authentication.

Thursday 10 July 2014

Sortition Governence

Overview

Recently, I've been thinking about how our democracy works here in the UK. This has mainly been spurred on by a couple of things, but for this blog post, the main impetus was the strike action this morning.

Strike Action

Many public sector workers have gone on strike over pay, pensions and conditions. A discussion on BBC Radio 4 this morning, involving a government official and an official from Unison stated that many of the strike ballots only had a turn out of around 20%. This was claimed to be, at best, failure of democracy and at worse a direct abuse of the system.

When compared to the Conservative's PCC Election farce in 2012, with a turnout of 15.1%, attacking a 20% turnout as not being legitimate or representative is pure hypocrisy.

Concentration of Power

Democracy is, in essence, a codified method of concentrating the power of the many in the hands of a trusted few with the aim of producing legislation that benefits the many.

Many distributed systems suffer and become vulnerable when power is concentrated too strongly. In the case of BitCoin, this power concentration is often seen in large mining pools, leading to such issues as a 51% attack.

In a democracy, this concentration of power is wielded by people -- fallible humans. People who can be swayed with bribery, extortion or simple cronyism. This leads to sub-par legislation, and outright exploitation of the vulnerable members of society.

Sortition

Sortition is a method of governance whereby the legislators are picked from a pool of eligible people at random. This means that no matter how wealthy you are, who your friends are, or what you do for a living, you may be selected to serve, thus distributing power to the populace much more effectively, and removing power from a wealthy few.

Sortition House of Lords

In our current system, legislation usually must pass the House of Lords. These are not elected officials, but have often pulled sent bad legislation back to the commons to be "fixed."

In this method, for every piece of legislation the commons produces, to reach Royal Assent, it must pass a vote by a suitably large sample of the population, selected at random. Anyone shirking their duty to vote will be fined or jailed.

This vote should ensure that any legislation is suitably aligned with the views and needs of the public, and would entail the minimum of disruption to people's lives. However, if the Commons never suggests the legislation that the people require or desire, it can never come to fruition.

It also does not provide fantastic protection against bribery.

Sortition Commons

Sortition is a method of governance whereby the legislators are picked from a pool of eligible people at random. This means that no matter how wealthy you are, who your friends are, or what you do for a living, you may be selected to serve.

To ensure that the committee is representative of the average person, they could be paid a wage close to that of what a common person could expect. For example, the median wage of the country, plus 10%. That way, on average, people neither lose out nor gain too much from serving on the committee. It would also enable the legislature the enforce service, threatening fines or jail for not serving, much like jury duty. In the case where the person felt they could not serve at the time they were selected, for example, a new mother, they could defer their service to a time that they were more able to serve.

To ensure that a wealthy few could not simply prevent unfavourable laws from coming to fruition, this system would do away with the House of Lords and the process of Royal Assent. Any legislature produced by the sortition Commons would become law without any other outside interference.

This system should ensure that the wealthy don't have too much say into the process (they're only the 1% after all! They'd commonly make up 1% of the committee, and thus have little say) and that those making the legislation are insulated from the temptation of power. It would not help prevent bribery, additional checks and balances may be required to ensure that no bribery occurred.

Issues

Sortition has several criticism commonly leveled at it, many of which revolve around a fear of loss of control.

The Racist Committee

With a sufficiently large sample of people, having an overwhelming representation of extreme views, such as racism, is highly unlikely.

To ensure that, in the unlikely event, an extreme law is passed, the public could have a variant of the right to recall for any legislation. It could even be enshrined in a constitution which upholds human rights and the operation of the sortition committee. Any changes to the constitution would have to be put to referendum.

The Bribery Problem

Many members of a sortition committee may be susceptible to bribery. A simple solution is to have the members of the sortition committee be completely anonymous until after their term has been served. If you can't find someone, you can't bribe them.

There could also be the stipulation that their accounts be checked for irregularities and large payments, to ensure that if their anonymity is breached, they do not accept any bribes. If they are found to have accepted bribes, the member of the committee and the person paying the bribe should be subject to hefty fines and possibly jail.

Incompetence

Many people would be worried about the members of the sortition committee not having the relevant skills to produce legislation or respond to the demands of the public service adequately. This could, in part, be addressed by adequate training. It should probably include some training in statistics, and especially their common mis-uses.

Another potential way to deal with this would be to have a large body of experts available to answer their questions and give expert opinions, much like an expert witness maybe called to a trial, a request could be made for (for example) experts in civil engineering. Several could be found and asked for their opinions for the committee, or to produce independent reports for the committee to consider.

Conclusion

I'm not overly convinced of this idea. I keep asking those around me about it, and it usually receives mixed reactions, and some offer additions and modifications to the idea. Most feel that in our current system, the idea is untennable, others feel that it's only acheivable with incremental changes to the current system.

I feel that sortition has a reasonable chance of working, however, it needs to prove it's metal in the small scale first. For example, unions, the ACM, IEEE or NICE could use sortition to elect their governing persons from their membership without discrimination. Even The Co-operative Group, UK could potentially use it to select it's governance.

Any issues found while "testing" the idea in the small could be used to refine the idea and re-test it.

Friday 16 May 2014

Should OpenSSL be Considered Safety Critical Software?

Overview

I'd like to make a case that OpenSSL should be considered safety-critical software.

Acknowledgements

Thanks to qlkzy, Andrew Stevens and my other half for proof-reading my initial drafts of this and offering valuable feed back.

Safety Critical

Safety critical software is a system in which the failure of the software would lead to a "safety loss event". That is, if it all went to shit, there would be a loss of life, injury or major property loss.

The medical, automotive, aerospace and rail industry are all quite good at delivering high integrity software, and over the billions of operational hours, the number of defects is incredibly low. Although, the number of defects is non-zero, and lives have been lost.

OpenSSL

OpenSSL is a library which is used world-wide to ensure the confidentiality and authenticity of communications. That is, no one can read your communication, and no one can tamper with it.

It is most commonly used to secure communications between a client's web browser and a web server. To most people, this just means that their Facebook password remains from safe prying eyes when sent to Facebook.

We can look at some not-too-unrealistic scenarios in which the safety of life and limb are placed in OpenSSL's hands.

Becoming a Critical System

Here are a few scenarios that, although they rely on social effects, are very much protecting the life of their user.

Scenario 1

A person who is in an abusive relationship attempts to find resources on the internet that will help them leave the relationship. Most likely, their Google searches will be protected from prying eyes by the fact that their searches are protected by an SSL connection, provided courtesy of OpenSSL.

If the abusive partner finds out about them seeking help, it may end with the victim being forced out of their home, or further abuse or violence from their abusive partner.

Scenario 2

A member of the LGBT community would like to get support or friendship from other members of their community whilst living in a country where being an LGBT person could result in death (for example, Uganda), or living with family who would ostracise an LGBT member (especially problematic for youngsters with no income or home of their own).

The person could use a fake Facebook profile or Reddit account to connect with other LGBT people, and put themselves at serious risk, were their communications not sufficiently protected. Should OpenSSL fail, these people could be killed or forced out of their homes.

Scenario 3

Almost all banks use SSL connections to secure their online-banking facilities.

Assuming that they use OpenSSL (and they will at some level, most web-browsers use it), then the potential damage to the economy of a large-scale failure is massive.

Scenario 4

A political dissident uses The Onion Router (TOR) to hide their identity and make their opinion of the government known in a country where the price of dissent is death.

TOR currently uses OpenSSL to protect its communications. Should this fail for those users, there would be serious implications for their users, including prosecution, torture or death.

Scenario 5

This is definitely my most out-there scenario, but it could arise, especially in countries where the safety regulations are much more lax.

A nuclear power station is remotely monitored with some control being off-site (e.g. night shifts, etc.)

If the control communications were protected by OpenSSL, and could be interfered with in any way, this could obviously lead to a major ecological disaster, loss of life and serious health issues for the local population.

Conclusion

I hope I've made it clear that I consider OpenSSL (and similar libraries) to be safety critical software, since it is being used to protect life and sizeable swathes of the economy.

I feel strongly that, like a good police force, your safety should be free and available to all regardless of who you are.

Unfortunately, we do not currently treat them as such. There is limited auditing in OpenSSL, and the only SSL library that has had any in-depth review is PolarSSL, and that is not free for commercial applications. This means that commercial systems (e.g. banks) will be less likely to use the verified implementation when a free, more well-known implementation is available.

Monday 21 April 2014

YubiKey Side Channel Attacks?

Overview

Sometime ago, I wrote about the YubiKey, and I've also mentioned how the YubiKey is vulnerable to a simple MiTM attack, by simply phising the user. This is nothing complicated, and it's hard to defend against these sort of things.

Given the paper "Cache-timing attacks on AES" by Bernstein (2005) and recently "Fine grain Cross-VM Attacks on Xen and VMware are possible!" by Apecechea et. al. (2014) we now have to begin asking the question, what threats does this pose beyond OpenSSL, PolarSSL and libgcrypt?

This is not an actual attack by any means, it is speculation of what might be an interesting and fruitful avenue of research.

YubiKey

As in my previous post, the YubiKey's OTP mechanism is relatively simple; take one AES block (128b), including a counter and a secret Id. Encrypt said block (in the YubiKey "dongle") with a secret key shared between the dongle and the server, and type it into the host machine to be sent to the server for checking.

The server decrypts the OTP received and checks the various internal values, e.g. the secret Id, the counter value, etc. If all's well, access is granted.

The Attack

Apecechea's attack relies on co-locating a VMs with a target, profiling the hardware architecture, then sending chosen plaintexts to the target to be encrypted under an unknown key.

I am wondering if, given a VM which verifies YubiKey OTPs, one could co-locate a couple of malicious VMs, profile the server hardware, then send lots of OTPs to be validated (i.e. decrypted) to the target VM.

If this exposed the secret key for the YubiKey, one could then decrypt any previous YubiKey OTP, and extract the secret Id and other values, and simply forge valid YubiKey OTPs.

Mitigations

AES-NI

According to Apecechea's paper, AES-NI aware implementations were much more difficult to attack. This could well be a saving grace in many situations, not just the YubiKey situation.

YubiKey HSM

I am not familiar with the YubiKey hardware security module, I know that it has undergone review, but I don't think that review extended to side-channel or physical attacks.

If the HSM is a constant-time verifier of YubiKey OTPs, then this is a winner. Any organisation using a YubiKey HSM is automatically saved.

Append a MAC

Appending a message authentication code (though, I'm going to guess that AES-CMAC is a bad idea) of the OTP would allow forgeries to be rejected outright without any decryption of the OTP.

This does not stop attackers from trying to get as many OTPs with valid MACs as possible and using them as part of their attack, but it would take a much longer time, since the paper calls for 2^26 to 2^28 attempts.

It would also make the typed OTP much longer, and that means more "typing" for the device, and more complexity in the device. The newer YubiKeys have OATH capabilities, so they should have HMAC hardware in them, which may make this more feasible.

Using a MAC may obviate the need for an OTP anyways, so what the hey.

Rate Limiting

Simply limiting the number of attempts you allow per second as a general rule would alleviate the problem, as the attacker does need a fairly large amount of attempts (though, by no means prohibitive). Making it take 1 second OTP verification would make the attack take 2^26 seconds (roughly 2 years) per YubiKey the attacker wished to break.

This obviously is not a great way to do things, but it's better than nothing, and combined with suitable key rotation (say, once a year), you'll be quite safe. But remember, attacks only get better.

Hardened AES

This would be super, but it would seem that new side-channel attacks on AES just won't stop. Be they power, timing, or cache-based, we're not fighting a winning battle here.

In much the same way as patching OpenSSL against BEAST lead to Lucky13, I worry that we're going to end up chasing our tail with this one, since the performance trade-off is so great.

Constant Time Encryption

It should be the case that something like Salsa20 is constant-time when implemented correctly. It should have no leakages.

Salsa20 is also more amenable to an embedded environment than AES, and offers similar security guarantees.

If I were the YubiKey guys, this is where I'd be focusing my attention and looking to migrate to, assuming I wanted to keep the current usage pattern of the YubiKey (Plug it in, have it "type" a password). It is a much faster, simpler algorithm, it's designed to be strong against side-channel attacks, the reference implementation is public domain, and written so that it makes no use of malloc, so that you can get it in an embedded system easily.

Seriously, Salsa20 is where it's at.

Conclusion

Looking at the YubiKey, since the verification makes use of AES, it makes sense to ask if it is safe against Bernstein's cache-timing attacks. On the assumption it is not safe against it, what measures can be taken.

Overall, I would say that a good implementation of rate limiting should be used, whilst looking to migrate away from AES to a constant time cipher like Salsa20. My strong preference would be Salsa20.

Tuesday 15 April 2014

A New Spectator Sport

You know when you see firefighters at a car crash, trying to rescue someone? Yea, folks OpenBSD are undertaking something similar right now.

Highlights 2014-04-14 13:45

The next gotofail?

"correct cases of code occuring directly after goto/break/returnok miod@ guenther@"

I have no indication that this would lead to a genuine vulnerability, but christ the code-smell

The Entropy is 2014-04-14 13:43:13

"So the OpenSSL codebase does "get the time, add it as a random seed"in a bunch of places inside the TLS engine, to try to keep entropy high.I wonder if their moto is "If you can't solve a problem, at least tryto do it badly".ok miod"

Apparently OpenSSL is adding the time to their entropy pool? But attackers can easily guess the time that your server thinks it is, so is this actually breaking things, or is the entropy pool sufficiently resistant to these kinds of boo-boos?

Highlights 2014-04-15 12:04

FOR GLORY

"Send the rotIBM stream cipher (ebcdic) to Valhalla to party for eternity with the bearded ones... some API's that nobody should be using will dissapear with this commit."

I love this guy's sense of humor. I didn't even know there was a "rotIBM" stream cipher. I'm sure it must've undergone a rigorous cryptanalysis, and IBM have never gotten in bed with nefarious government agencies. Oh, and pigs fly.

Header

"strncpy(d, s, strlen(s)) is a special kind of stupid. even when it's right,it looks wrong. replace with auditable code and eliminate many strlen calls to improve efficiency. (wait, did somebody say FASTER?) ok beck"

I think the commit message says it all.

Wednesday 9 April 2014

My Heart Bleeds for OpenSSL

Overview

Today has been eventful, to say that least. My colleague and I patched several services, re-issued keys for them and started the ball rolling on revocation. It took all sodding day.

The progenitor of this has caused a lot of hubris and internet opinions.

Internet Opinions

Basically, a lot of people have said a lot of things, and I'd like to address one or two of them in something more than a pithy tweet.

Dump OpenSSL

I have seen this almost everywhere I looked. The call to arms to fling out OpenSSL, or plain re-write it. I get it, I understand that OpenSSL is not the healthiest code base. I often use it as an example of an extremely unhealthy code base.

However, just because your friend is ill, doesn't mean you take them out behind the chemical sheds and shoot them.

OpenSSL has a long and "rich" history. In fact, it's got so much history, that some of it has become useful! Rewriting is almost always a mistake.

Depending on how deep we go, we could end up throwing everything away, the CA infrastructure, the cipher suite flexibility, the vast quantities of knowledge that surround the OpenSSL ecosystem and allow thousands of developers to work in a more secure manner than would otherwise be possible.

I want to cover the cipher suite flexibility quickly, just to point out how much of a fantastic idea it is. The server and client tell each other what their most-prefered ciphers are, then they agree on the strongest one that they both support.

This has massive benefits, for instance, when BEAST et. al. came along and effectively broke block cipher based OpenSSL for a while, we could jump to RC4 for a little while. When that was broke, we all hauled ass back to AES. This gives incredibly flexibility when dealing with many attacks, by simply being able to circumvent the affected parts.

Secondly, last I checked, OpenSSL still has support for DES and RC4. This allows for graceful handling of legacy clients. This is very important when you have a very large, heterogeneous installed base.

If we lost cipher suite flexibility, we'd open ourselves up to so much more Bad Stuff.

The CA Infrastructure I'm less happy with, but that's a post for another day, and, frankly, I'd be surprised if anyone seriously improved on it for your average Joe in the next 2 - 3 years. I've not looked into it much, but NameCoin, or a NameCoin-like protocol looks promising from afar.

Suddenly retraining literally tens of thousands of developers, system administrators, etc. I'm sure that will be a cheap, quick and widely accepted move. It could happen, but it's quite entrenched. I would see the move happening over several years, with what ever "comes next" supporting developers as they move from one platform to the other. In fact, it could be TLS 1.3 (I think 1.2 is the latest at the time of writing, correct me in the comments below if I'm wrong) as implemented in *drum roll* OpenSSL.

Throw Money at the Problem

I do not think OpenSSL's developers are monkeys. If you look at the project, it's effectively dying in a lot of respects. It needs more than money, it needs a huge injection of resource, people, contributions, and more than anything else, enthusiasm.

It's kinda sad that one of the most critical pieces of software to most people's lives these days has the appeal to developers of a punch right in the face.

This is mostly because the OpenSSL project is much more organic than most other projects out there -- it's grown.

I have looked at the OpenSSL code, I've tried to ... do ... things to it. It's a big and intimidating code base, and I really did not want to break anything. So much so that I abandoned my idea. I looked into the abyss, and the abyss looked into me.

An injection of cash and some good C developers on the project as their day job would make a huge dent in this. Their first job could be making the code amenable to testing and testing the result. Their second job could be adding newer, safer cipher suites, and so on. But beyond that, OpenSSL is not Java or C#. There's no benevolent banker running the show, there is only the dedication and hard work of the volunteers.

Once the project has had a good cleanup, (and probably a few bug fixes!), an audit of some sort would definitely be nice, but I think that this is of limited value. OpenSSL is a constantly evolving project. It is not the code to run a jet engine -- written once and run on well-known hardware. It's changing almost daily.

The only way to get that level of assurance would be to build an automated verification suite, or similar. I'm looking towards Frama-C, though something with more power may be required. That way, the assurances would be quite hard to break, in much the same way as introducing regression bugs is made harder by decent unit testing.

Conclusion

I feel for the OpenSSL developers, all two of them, I really do. They have come under a huge amount of fire recently. I would not be surprised if they just downed tools to go live in a cabin in the woods tomorrow, out of sheer frustration and upset. When their code works, they get no thanks, when it breaks, they never hear the end of it.

Unfortunately for some projects, it just becomes time to stop adding features, and to go and address some platform problems, to run a comb through it and straighten things out. This is often made hard by a lack of a large test suite.

We did a similar thing with our product at my day job recently. We stopped work on features for nearly 2 years to drag the system out of the dark ages, add in a real security system, upgrade the underlying technology, really improve our test suite, etc. It's painful, slow and incredibly frustrating at the time. But when you're finished, continuing to work with your product is so much easier. It stops the "Let's rewrite!" urge quite well.

Edit, 2014-04-09

There is some discussion over on Hacker News (Hello!). I'd like to address a couple of the points.

SSL/TLS /= OpenSSL

I completely agree. The reason this comes across as a point that needs to be made is that, in writing the post in the middle of the night, immediately before going to bed without hashing the ideas out right, I've managed to put issues that were solely OpenSSL's fault right next to the spec's fault.

The reason that I talk about the potential for losing things like cipher suite flexibility is that a new library would likely implement something AES & RSA based with SHA-2 or SHA-3, and leave the others for a rainy day. We need more cipher suites at our fingertips than that (and good, safe, defaults).

Beyond that, unfortunately, attacks like Lucky Thirteen were basically other attacks, but using timing information to extra data. This sort of attack is only possible with the Mac-then-Encrypt construction that SSL/TLS uses, thus being the problem with the spec. The fact that it is a timing exploit is an issue with OpenSSL. I know the bug is effectively intractable, but I'm also aware that OpenSSL has fixed it to a satisfactory level.

You can't spell

I ager.

Seriously though, while I do make an effort, it was the middle of the night and for some reason my spell checker was not red-lining. Opening up the post this morning gave me an opportunity to go through and fix these issues ;-)

OpenSSL isn't formally verified!?

No, neither is any part of your browser, your kernel, your hardware, the image rendering libraries that your browser uses, the web servers you talk to, or basically any other part of the system you use.

The closest to formally verified in your day-to-day life that you're going to get may well be the brakes on your car, or the control systems on a jet engine. I shit you not.

We live on fragile hopes and dreams.

Monoculture Leads to Mass Vulnerability

I think that this is a real problem. However, this is crypto code. I know, I know, it's still a systems problem in lots of ways, but I can't avoid the feeling of "don't roll your own."

Asking normal developers to choose between several competing libraries with different interfaces, vulnerabilities, performance characteristics, etc. would likely result in carnage. We know that developers, when asked to choose cobble together cipher primitives, are basically pants-on-heads retarded, why would they be in a position to make those kinds of decisions on a larger scale?

I that out best bet is to to ensure that OpenSSL is a rock-solid piece of software with things like automated verification, extensive testing and independent audits. Unfortunately, that's extremely expensive (don't forget, re-writing is likely more expensive).

Blogger is Awful

Yes. I am always surprised that my blog doesn't render in a browser with no JS.

Thursday 27 March 2014

Healthy Paranoia

Sometimes, I'll be coding away, make a change that I somewhat expect to break things, and nothing breaks.

Inwardly, I cheer. My test suite is perfect! It has allowed me to refactor with abandon. However, this quickly passes, and I feel the need to know if I am doing it "right", that is, I trust my intuition that it should be broken. Am I running the right version of the code? Did I restart the server to pickup my changes?

Often, I'll find that I was running an older, known-good bit of code, and that my change wasn't picked up. Once I force my changes to the local test server, I can see my changes have subtly broken something, and resume my fixing.

I like to think of this of healthy paranoia. If I've changed something that should've broken at least something, I should be looking for breakage. If none occurs, likelihood is that I've done something wrong (aside from writing bad code).

Friday 21 March 2014

On NaCl

Overview

Since I was introduced to the NaCl library, I've loved it. In the course of considering adding Salsa20/? and Poly-1305 support to OpenSSL, I thought I'd take a look at NaCl, since I knew it was based on those primitives as specified in Bernstein's papers.

Acknowledgments

Thanks to Bernstein for the NaCl library, Regher for his blog on software correctness and David Morris (qlkzy) for his proof-reading skills and contributions.

Getting the Code

The code is public domain, and very small. However, I could only find mirrors of it in GitHub, and no obvious reference to the "current" release. The best I could do was a tar.bz2 referenced from the install page.

I didn't want to install it, only read the code! Please, please throw up a prominent link to the following:

Source repository
Documentation
Issue Tracker

The OpenSSL project has all of these, despite being in the gutter in many other respects. They're not necessarily in a "prominent" place, and in the case of the documentation, they're incomplete.

Currently, simply pointing to http://hyperelliptic.org/nacl/nacl-20110221.tar.bz2 is insufficient, as it raises a couple of questions:

Is this the latest release? If not, which is the latest?
Why is it not distributed from somewhere supporting SSL/TLS?
Why is it on hyperelliptic and not nacl.cr.yp.to as the rest of the site is?

These are minor things, but they are crucial to ensure confidence in the project, it's long-term survival, and that any one wishing to contribute can do so effectively.

Duplicate Code

Nothing too worrisome, but there are a few pieces of duplicate code. They're mostly utility functions like load_littleendian, and rotate. The drawbacks of writing code multiple times are well documented, and I don't know why Bernstein chose to duplicate the code (they're declared static) rather than pull them up into a utility library of some sort. Hell, given how short these functions are, not making them simply macros seems a little odd.

The methods I saw duplicated most often are marked static, and as pointed out by qlkzy, they're probably there to ease the compiler's job when in-lining functions.

Undefined Behaviour

There is at least one function used regularly throughout the source (and duplicated!) that could be considered to have undefined behaviour. I have copied it here from crypto_core/salsa20/ref/core.c:

static uint32 rotate(uint32 u,int c)
{
  return (u << c) | (u >> (32 - c));
}

This function is used as part of a round for Salsa20, and it is only called with c with a fixed set of values, 7, 9, 13 and 18.

This is great, because if the compiler can figure out that it's possible set of input values never includes 32, great!

If, however, the compiler isn't so smart, then it may assume that c can be the full range of int, meaning that it is undefined on several values (most notably c = 32), meaning that it could be optimised to whatever the compiler feels like.

Regehr has an excellent blog post on this already, but for the general case, where the equivalent of c can be any value. In Salsa20, we don't need such generality.

There are a couple of potential fixes off the top of my head:

Use Regehr's Safe Rotate

Using the safe rotate proposed in Regehr's blog should alleviate these problems, but we don't actually need the generality.

There is also the issue that on older versions of Clang, his safe rotate is not recognised and optimised correctly, introducing some problematic performance issues

According to the log on Regher's bug, this has probably been fixed by now, and so could be safely used.

Define Multiple Rotate Functions

Define several functions, rotate_7(uint32 u), rotate_9(uint32 u), rotate_13(uint32 u), and rotate_18(uint32 u). for example:

static uint32 rotate_7(uint32 u)
{
  return (u << 7) | (u >> 25);
}

Which would ensure that the compiler knows that the shift is never undefined. hopefully, a good compiler would aggressively inline these.

Define Macros

Defining a macro would give a couple of benefits:

It could be put into a header file somewhere and used everywhere
It would be inlined automatically by the pre-processor
It can be made to avoid most forms of undefined behaviour

The macro would simply be :

#define ROTATE_7(u) (((u) << 7) | ((u) >> 25))

That would allow for a rotate-left by 7, avoiding any undefined behaviour and forcing the C pre-processor to inline it for you. These can be written once in a header file.

However, many C programmers do feel that macros are quite messy, and so would prefer to simply declare non-static utility functions in a header and use those, but copilers might not inline these, leading to an unacceptable performance penalty.

Ok Compilers

Once I got the code, I found a list called "okcompilers", posted here for completeness:

gcc -m64 -O3 -fomit-frame-pointer -funroll-loops
gcc -m64 -O -fomit-frame-pointer
gcc -m64 -fomit-frame-pointer
gcc -m32 -O3 -fomit-frame-pointer -funroll-loops
gcc -m32 -O -fomit-frame-pointer
gcc -m32 -fomit-frame-pointer
spu-gcc -mstdmain -march=cell -O3 -funroll-loops -fomit-frame-pointer -Drandom=rand -Dsrandom=srand
spu-gcc -mstdmain -march=cell -O -fomit-frame-pointer -Drandom=rand -Dsrandom=srand

This has the really bad effect of tying yourself to a given set of compilers and flags. This means that you won't get any new features or tooling from newer compilers (e.g. Clang with Valgrind) and that those who don't have access to the specified compilers will not feel at ease when building the source.

Access to the latest and greatest tooling is often critical for tracking down bugs, memory leaks, etc.

Will it break, won't it break should not be in my mind when I'm building with a different, but still standards compliant, compiler.

There is also the issue of knowing exactly which flags are required for performance, and which are required for security. What happens if I can't use -funroll-loops, does this break the performance, or does it affect the timing invarience of the algorithms? What about -O3, is that only for performance purposes? This is about as clear as mud.

Non-Standard Build

The installation proceedure recommends I use the "./do" command to build the source. Just use Make, we all know how to use it. It's standard, well understood and easily extensible.

I wouldn't normally advocate Make, but something is better than nothing, and for a small project like this, I don't think it would need anything more advanced than Make.

Supplying Assembler Alternatives

This seems like a good idea at first, you can hand optimise the assembler better than a compiler, and in the case of C, avoid undefined behaviour! Win!

However, these are static blobs in comparison to C code. C code inherits a lot of benefits from compiler development. For example, I don't think you'd be able to do anything with the assembler blob and valgrind or ctgrind.

You also can't hand optimise better than a compiler, not any more. And while attacks only get better, compilers only get more aggressive optimisations.

Conclusion

I love the idea of the NaCl library, but I think for it's long-term survival, it needs to pull up it's "administrative socks" as it were. Potential undefined behaviour should be a huge no-no in cryptographic software, and standard software engineering practices; such as a standardised build system, version control, and so on; should be a given

I would hate for such a fantastic library and concept to disappear or fall by the wayside because of such trivial issues.

Sunday 16 March 2014

Google+ Deception

Overview

Google+ is deceitful and I consider it to be harmful.

Blogger and Google+

Obviously, I use blogger (Or is it blogspot?) for blogging. I have a couple of blogs, and I update them reasonably frequently.

I was recently prompted on one of my blogs to start integrating with Google+ which is something that I would not want. I found the message to be difficult to follow at first glance, and couldn't see easily how to decline the offer.

Fast forwards to today, and both of my blogs had at some point gone from simply prompting me to share on Google+ to simply posting automatically. This is unwanted behaviour, as I do not wish to use Google+, or any social network for that matter.

I do not want to "self-promote" my personal blogs and spam my friends. If they want to read my blog(s), they will do so without me spamming them.

Worse still, do not take the choice to spam people I know on the social network I don't want to use away from me. I didn't choose to use Google+, I didn't choose to spam those people.

Google, please, for the love of all that is holy, please stop trying to force everyone to be a Google+ user. It's like the creepy guy with the beat-up van with "Free Candy" hand written on the side. No one wants what you're offering, and everyone knows that it's the old bait-and-switch. Just stop.

Thursday 27 February 2014

PPUK Reform

Overview

The Pirate Party UK (PPUK) is a political party in the UK, who typically fight for civil liberties, copyright reform and a more transparant government. Full disclosure, I'm a full, membership-paying member of PPUK.

This is not a reform proposal in the traditional sense. This is a look at how PPUK acts and is perceived, and if that can be altered to be more of a positive. Think of it as a perspective reform.

Where we're at

Historically, we've taken on campaigns at both the local and national level. We should keep on doing this, as both levels are required to effect our goals.

However, our campaigns have almost always been in opposition to something, for example, we opposed the withdrawal of Legal Aid, we opposed the detention of Chelsea Manning, the "snooper's charter", and so on.

When it comes down to it, we expend a disproportionate amount of energy saying "no" to our opponents. We often celebrate in the defeat of our opponents. This gives the impression to the public (well, those that know about us, that's a whole other blog post) that we are adversarial, and against progress.

We need to acknowledge that the stated goals of these schemes may have merit, bring substantial benefit to the public (ex. care.data), and even that they may align with our goals -- but that the proposed implementation has issues; we need to offer alternatives which still support the goal.

Further to this, we don't provide anything to the public -- no tools for activists, we currently have little in the way of getting up and running in a local area, how-tos for running a campaign, nothing.

The Future?

I feel that, once we're a larger organisation, it would be most helpful for us to conduct ourselves in a way that is consistent with the goals and principles laid out in our manifesto. Beyond that I feel that we should be aiming to improve our society.

I think that this means, instead of simply opposing problems, (e.g. pharmaceutical greed, violation of civil liberties, invasion of privacy, etc.) we must support alternative solutions.

The simplest one from that list is the invasion of our privacy -- we can support projects that improve people's privacy online (e.g. HTTPS Everywhere, TOR, GnuPG). We can provide infrastructure, many of our volunteers have skills they could offer (programming, design, UX, etc.), providing them with more public support and exposure, building our own complementary tools, educating people in their proper usage, and so on.

Personally, I feel that one of our most lacking areas is public education. We have so much knowledge, yet we consistently fail to share it.

At our most recent branch meeting, I was asking about how many members we have, and the numbers I got back varied from 300 to 700. I also happen to know that the number who voted in our most recent NEC elections was a mere 57 people for one of the posts. I am strongly of the belief that this is due to disaffection in our own ranks from to our perceived (and, in some cases actual) lack of action. We must do; the future is not opposed, the future is built.

Friday 21 February 2014

The Security of the Proposed care.data Scheme

Overview

Ceri the Duck has a blog post titled "Care.Data – why I am happy for my medical records to be shared for research". In this post, they make the argument that under the care.data scheme, data protection would be improved for patients.

In this post, I will tackle this misconception, which revolves around two core arguments; firstly that there will be limited access to identifiable information, and that the care.data scheme will provide a better security framework to work within.

No Identifiable Information

"care.data will only provide access to ‘potentially identifiable’ information"

If we look at the HSCIC price list, we see that they provide an extract of data "containing personal confidential data".

Even if this data has names, and other directly identifiable data stripped out, we know that anonymised data can be de-anonymised almost trivially (Further reading from Light Blue Touchpaper) in the vast majority of cases.

A Better Security Framework

"I am much happier with the level of data security care.data will provide than with the current ad-hoc arrangements. They will be consistent, with good oversight, the information disclosed will only be what is needed instead of having to comb through a patient’s full record, ..."

This is simply not true. The care.data scheme will be taking medical records from a setting where they are hard to even sort through with legitimate access (as pointed out by the author themselves) to a situation where the records will be much more easily accessible to many thousands of people, none of whom will have undergone any serious training into information security, data protection laws, or the ethical issues surrounding the use and dispersal of this data. It is also highly unlikely that they will have had so much as a criminal record background check.

Granted, the current situation is very poor, but it does not allow for a large scale abuse of the system. Sure, I could target Bob Smith, break into his doctor's surgery and steal his record. With the new system, I could target large swathes of the population by simply bribing the right people. The information gained could be used by all manner of people, be it for surreptitious back ground checks on potential dates, to discrediting a political candidate, and everything in between.

And that's just bribing-based attacks. Think of the rubber hose cryptanalysis opportunities, the social engineering based attacks, physical security attacks, phishing and spear phishing attacks, attacks on end-point security (this is your classical "hacking into the computer" attack), and so on.

Data transfer must occur at some point, and with great data transfer comes great opportunity. How do you conduct the transfer of data, and how do you setup the transfer so that you're definitely sending the data to the person(s) you think you are? These are generally considered to be solved problems in the cryptographic community for the most part (key distribution is hard, for instance), but in practice, it is anything but.

In short, this centralisation effectively paints a massive target on the back of the country's medical records, and gives access to institutions with some of the worst information security going in many places.

The adversaries in this situation will not be small time. Computer crime is big business, and aside from nation states, organised criminals are one of the hardest adversaries to defend against. They are extremely well organised and well funded, with extensive experience attacking high value targets. Once they've breached the system, they have the contacts in to sell the data on and actually turn a profit on this kind of attack.

Conclusion

Security is far harder than most people think, and most people don't know how much they don't know. HSCIC cannot be in the position of implementing this system and not be aware of the serious and numberous risks outlined.

The care.data scheme will suffer a breach, and given how centralised the system is likely to be, I expect the breach to be a large and very serious breach of previously unheard proportions.