Friday 21 March 2014

On NaCl

Overview

Since I was introduced to the NaCl library, I've loved it. In the course of considering adding Salsa20/? and Poly-1305 support to OpenSSL, I thought I'd take a look at NaCl, since I knew it was based on those primitives as specified in Bernstein's papers.

Acknowledgments

Thanks to Bernstein for the NaCl library, Regher for his blog on software correctness and David Morris (qlkzy)  for his proof-reading skills and contributions.

Getting the Code

The code is public domain, and very small. However, I could only find mirrors of it in GitHub, and no obvious reference to the "current" release. The best I could do was a tar.bz2 referenced from the install page.

I didn't want to install it, only read the code! Please, please throw up a prominent link to the following:
  • Source repository
  • Documentation
  • Issue Tracker
The OpenSSL project has all of these, despite being in the gutter in many other respects. They're not necessarily in a "prominent" place, and in the case of the documentation, they're incomplete.

Currently, simply pointing to http://hyperelliptic.org/nacl/nacl-20110221.tar.bz2 is insufficient, as it raises a couple of questions:
  • Is this the latest release? If not, which is the latest?
  • Why is it not distributed from somewhere supporting SSL/TLS?
  • Why is it on hyperelliptic and not nacl.cr.yp.to as the rest of the site is?
These are minor things, but they are crucial to ensure confidence in the project, it's long-term survival, and that any one wishing to contribute can do so effectively.

Duplicate Code

Nothing too worrisome, but there are a few pieces of duplicate code. They're mostly utility functions like load_littleendian, and rotate. The drawbacks of writing code multiple times are well documented, and I don't know why Bernstein chose to duplicate the code (they're declared static) rather than pull them up into a utility library of some sort. Hell, given how short these functions are, not making them simply macros seems a little odd.

The methods I saw duplicated most often are marked static, and as pointed out by qlkzy, they're probably there to ease the compiler's job when in-lining functions.

Undefined Behaviour

There is at least one function used regularly throughout the source (and duplicated!) that could be considered to have undefined behaviour. I have copied it here from crypto_core/salsa20/ref/core.c:

static uint32 rotate(uint32 u,int c)
{
  return (u << c) | (u >> (32 - c));
} 
 
This function is used as part of a round for Salsa20, and it is only called with c with a fixed set of values, 7, 9, 13 and 18.

This is great, because if the compiler can figure out that it's possible set of input values never includes 32, great!

If, however, the compiler isn't so smart, then it may assume that c can be the full range of int, meaning that it is undefined on several values (most notably c = 32), meaning that it could be optimised to whatever the compiler feels like.

Regehr has an excellent blog post on this already, but for the general case, where the equivalent of c can be any value. In Salsa20, we don't need such generality.

There are a couple of potential fixes off the top of my head:

Use Regehr's Safe Rotate

Using the safe rotate proposed in Regehr's blog should alleviate these problems, but we don't actually need the generality.

There is also the issue that on older versions of Clang, his safe rotate is not recognised and optimised correctly, introducing some problematic performance issues

According to the log on Regher's bug, this has probably been fixed by now, and so could be safely used.

Define Multiple Rotate Functions

Define several functions, rotate_7(uint32 u), rotate_9(uint32 u), rotate_13(uint32 u), and rotate_18(uint32 u). for example:

static uint32 rotate_7(uint32 u)
{
  return (u << 7) | (u >> 25);
} 

Which would ensure that the compiler knows that the shift is never undefined. hopefully, a good compiler would aggressively inline these.

Define Macros

Defining a macro would give a couple of benefits:

  • It could be put into a header file somewhere and used everywhere
  • It would be inlined automatically by the pre-processor
  • It can be made to avoid most forms of undefined behaviour
The macro would simply be :

#define ROTATE_7(u) (((u) << 7) | ((u) >> 25))

That would allow for a rotate-left by 7, avoiding any undefined behaviour and forcing the C pre-processor to inline it for you. These can be written once in a header file.

However, many C programmers do feel that macros are quite messy, and so would prefer to simply declare non-static utility functions in a header and use those, but copilers might not inline these, leading to an unacceptable performance penalty.

Ok Compilers

Once I got the code, I found a list called "okcompilers", posted here for completeness:

gcc -m64 -O3 -fomit-frame-pointer -funroll-loops
gcc -m64 -O -fomit-frame-pointer
gcc -m64 -fomit-frame-pointer
gcc -m32 -O3 -fomit-frame-pointer -funroll-loops
gcc -m32 -O -fomit-frame-pointer
gcc -m32 -fomit-frame-pointer
spu-gcc -mstdmain -march=cell -O3 -funroll-loops -fomit-frame-pointer -Drandom=rand -Dsrandom=srand
spu-gcc -mstdmain -march=cell -O -fomit-frame-pointer -Drandom=rand -Dsrandom=srand

This has the really bad effect of tying yourself to a given set of compilers and flags. This means that you won't get any new features or tooling from newer compilers (e.g. Clang with Valgrind) and that those who don't have access to the specified compilers will not feel at ease when building the source.

Access to the latest and greatest tooling is often critical for tracking down bugs, memory leaks, etc.

Will it break, won't it break should not be in my mind when I'm building with a different, but still standards compliant, compiler.

There is also the issue of knowing exactly which flags are required for performance, and which are required for security. What happens if I can't use -funroll-loops, does this break the performance, or does it affect the timing invarience of  the algorithms? What about -O3, is that only for performance purposes? This is about as clear as mud.

Non-Standard Build

The installation proceedure recommends I use the "./do" command to build the source. Just use Make, we all know how to use it. It's standard, well understood and easily extensible.

I wouldn't normally advocate Make, but something is better than nothing, and for a small project like this, I don't think it would need anything more advanced than Make.

Supplying Assembler Alternatives

This seems like a good idea at first, you can hand optimise the assembler better than a compiler, and in the case of C, avoid undefined behaviour! Win!

However, these are static blobs in comparison to C code. C code inherits a lot of benefits from compiler development. For example, I don't think you'd be able to do anything with the assembler blob and valgrind or ctgrind.

You also can't hand optimise better than a compiler, not any more. And while attacks only get better, compilers only get more aggressive optimisations.

Conclusion

I love the idea of the NaCl library, but I think for it's long-term survival, it needs to pull up it's "administrative socks" as it were. Potential undefined behaviour should be a huge no-no in cryptographic software, and standard software engineering practices; such as a standardised build system, version control, and so on; should be a given

I would hate for such a fantastic library and concept to disappear or fall by the wayside because of such trivial issues.

No comments:

Post a Comment