Saturday, 8 August 2015

SQLite and Testing

Categorical claims are often the source of faulty statements. "Don't test with SQLLite [sic] when you use Postgres in Production"  by Robellard is a fantastic example. I actually agree with a variant of this statement: "If you need high levels of assurance, don't test with SQLite alone when you use Postgres in production."

Robellard bases his claim on several points, noting that "SQLite has different SQL semantics than Postgres," "SQLite has different bugs than Postgres," and "Postgres has way more features that SQLite." He has a couple more points, but all of these largely amount to a discrepancy between SQLite and Postgres, or between one Postgres version and another, leading to a defect. These points are a genuine concern, but his claim relies on using exactly one database back-end for testing, and exactly one risk profile for various applications.

As a quick diversion, I am not using the common definition of risk which is synonymous with chance. I am using a more stringent definition: "the effect of uncertainty on objectives" as specified in ISO Guide 73:2009. This definition often requires an assessment of both the impact and likelihood of some form of scenario to obtain a fuller picture of an "effect."

If the risk posed by defects caused by an SQLite-Postgres discrepancy is too high, then you'll likely want use Postgres as part of your testing strategy. If the risk posed is sufficiently low, then SQLite alone may be appropriate. These are predicated on the risk posed by defects, and the organisational appetite for risk.

A testing strategy comprising several different testing methodologies can often be thought of as a filter of several layers. Different layers are variously better or worse at surfacing different types of defects. Some are more likely to surface defects within components, and others are better at locating defects in the interactions between components. Other "layers" might be useful for catching other classes of defects. Each layer reduces the likelihood of a defect reaching production, which reduces the risk that defects pose. Each layer also has a cost associated with writing and maintaining that layer.

It's quite common for different layers to be run at different times. For instance, mock-based unit tests might be run very frequently by developers. This provides the developers with very quick feedback on their work. Integration tests backed by an in-memory database might be run prior to committing. These take a little longer to run and so might get run less often, but still catch most problems caused by erroneous component interactions. A continuous integration (CI) server might run integration tests backed by Postgres, and slower UI tests periodically. Finally, penetration tests might be conducted on a yearly or six-monthly basis.

This sort of process aims to allow developers the flexibility to work with confidence by providing quick feedback. However, it also provides heavier-weight checking for the increased levels of assurance required for the risk-averse organisation. An organisation with a greater appetite for risk may remove one or more of those layers, such as in-memory integration tests, to speed development. This saves them money and time but increases their exposure to risk posed by defects.

SQLite is just a tool which may be used as part of one's testing strategy. Declaring "Don't test with SQLLite [sic] when you use Postgres in Production" ignores how it may be usefully applied to reduce risk in a project. In many cases SQLite is entirely appropriate, as the situation simply does not require high levels of assurance. In other cases, it may form part of a more holistic approach along side testing against other database backends, or be removed entirely.

Not every organisation is NASA, and not every project handles secrets of national import. Most failures do not kill people. An honest assessment of the risks would ideally drive the selection of the testing strategy. Often-times this selection will be balanced against other concerns, such as time-to-market and budget. There is no silver bullet. A practical, well-rounded solution is often most appropriate.

Saturday, 25 July 2015

Infosec's ability to quantify risk

In Paul Graham's latest post, "Infosec's inability to quantify risk," Graham makes the following claim:
"Infosec isn't a real profession. Among the things missing is proper "risk analysis". Instead of quantifying risk, we treat it as an absolute. Risk is binary, either there is risk or there isn't. We respond to risk emotionally rather than rationally, claiming all risk needs to be removed. This is why nobody listens to us. Business leaders quantify and prioritize risk, but we don't, so our useless advice is ignored."

I'm not going to get into a debate as to the legitimacy of infosec as a profession. My job entails an awful lot of infosec duties, and there are plenty of folks turning a pretty penny in the industry. It's simply not my place to tell people what they can and cannot define as a "profession."

However, I do take issue with the claim that the infosec community lack proper risk analysis tools. We have risk management tools coming out of our ears. We have risk management tools at every level. We have those used at the level of design and implementation, for assessing the risk a vulnerability poses to an organisation, and even tools for analysing risk at an organisational level.

At the design and implementation level, we have software maturity models. Many common ones explicitly include threat modelling and other risk assessment and analysis activities.

One of the explicit aims of the Building Security in Maturity Model (BSIMM) is "Informed risk management decisions." Some activities in the model include "Identify PII obligations" (CP1.2) and "Identify potential attackers" (AM1.3). These are the basic building blocks of risk analysis activities.

The Open Software Assurance Maturity Model (OpenSAMM) follows a similar pattern, including a requirement to "Classify data and applications based on business risk" (SM2) and "Explicitly evaluate risk from third-party components" (TA3).

Finally, the Microsoft Security Development Lifecycle requires that users "Use Threat Modelling" to "[...] determine risks from those threats, and establish appropriate mitigations." (SDL Practice #7).

So, we can clearly see that risk analysis is required during the design and implementation of a system. Although no risk management methodology is prescribed by the maturity models, it's easy to see that we're clearly in an ecosystem that's not only acutely aware of risk, but also the way those risks will impact organisational objectives.

If these maturity models fail to produce adequately secure software, we need to understand how bad a vulnerability is. Put simply, statements like "On the scale of 1 to 10, this is an 11" are not useful. I understand why such statements are sometimes necessary, but I worry about the media becoming fatigued.

Vulnerabilities are classified using one of several methods. Off the top of my head, I can think of three:
  1. Common Vulnerability Scoring System (CVSS)
  2. DREAD Risk Assessment Model (Wikipedia)
  3. STRIDE (Wikipedia)
These allow for those with infosec duties to roughly determine the risk that a vulnerability may pose to their organisation. Put simply, they allow for the assessment of the risk posed to one's systems. They are a (blunt) tool for risk assessment.

Finally, there are whole-organisation mechanisms for managing risks, which are often built into an Information Security Management System (ISMS). One of the broadest ISMS standards is BS ISO/IEC 27001:2013, which states:
"The organization shall define and apply an information security risk assessment process [...]"
If this seems a bit general, you should be aware that an example of a risk management process (which includes mechanisms for risk assessment & analysis) is available in BS ISO/IEC 27005:2011.

Let's look at the CERT Operationally Critical Threat, Asset, and Vulnerability Evaluation (OCTAVE) Allegro technical report:
"OCTAVE Allegro is a methodology to streamline and optimize the process of assessing information security risks [...]"
Similarly, Appendix A provides guidance on risk management, which includes sections on risk assessment and analysis.

Yet another standard is NIST SP800-30 Revision 1, "Guide for Conducting
Risk Assessments". It states it's purpose quite clearly in section 1.1 "Purpose and Applicability"
"The purpose of Special Publication 800-30 is to provide guidance for conducting risk assessments [...]"
NIST SP800-30 Revision 1 also provides an example of how to conduct a risk assessment.

As you can see, members of the infosec community have quite a few tools for risk assessment and analysis at our finger-tips. From the design and implementation of software, through to the assessment of individual vulnerabilities, and even for assessing, analysing, and mitigating organisational risk, we're well equipped.

The infosec community is often very bad at communicating, and the media likes a salacious story. How often have you heard that a cure for cancer has been found, sight returned to the blind, and teleportation achieved? Recently, members of the infosec community have played into this, but that does not eliminate the fact that we do have tools for proper risk management. Our field is not so naive that we blindly believe all risk to be unacceptable.

Sunday, 7 June 2015

The Four Words of Distress

The title of this post is taken directly from The Codeless Code, "The Four Words of Distress"

I read it again yesterday, and today I have been bitten by it.

I accidentally removed the content of a comment on this blog. I don't get many comments, so I feel that all of them are important contributions (except the incessant 50 Shades of Grey spam, urgh, do I regret writing about that.) As such, removing a comment is "serious" to me.

Upon realising my error, I immediately searched the page for an undo or a "restore" link of some sort, and eventually went to search for an answer to my problems. I only found unanswered questions on Google's Groups, and a Product Help page claiming that I should've been sent to a confirmation page (no such thing happened)

To have fixed this problem, there would have been any number of ways to deal with it:

  1. A confirmation dialogue, alongside a "don't show this again" check box.
  2. A trash can, which comments can go to for a fixed period of time.
  3. An undo button
  4. A help guide.
As it stands, I accidentally clicked the "Remove Content" link, and now, without warning, the comment has gone. I worry that this is a black mark against the commenter's account, when it is a simple mistake.

Saturday, 6 June 2015

Chrooting a Mumble Server on OpenBSD

One of my colleagues is starting remote working shortly. As such, we needed a VoIP solution that worked for everyone, Mac, Linux and FreeBSD. It was discovered that Mumble provided ample quality and worked everywhere. Top it off with the fact that we could host it ourselves, and we looked to be set.

However, being security conscious, I like to sandbox any service I have. Further, since this solution is slightly ad-hoc at the moment, it's being run off a personal BigV server, running OpenBSD. So I set out to chroot the package-managed mumble server, umurmur, which does not include sandboxing as default.

Fortunately, if no logfile is specified, umurmurd will log to syslog, and it does not support config reloading, so I don't need to worry about that.

Chrooting is not entirely simple, and simple versions can be improved by "refreshing" the chroot every time the service starts. This means that if an attacker infects the binary in some way, it'll get cleared out and replaced after a restart. As another bonus, if a shared library is updated, it simply won't get found, which tells you what to update! If the binary gets updated, it'll be copied in fresh when the service is restarted.

To do this, we modify /etc/rc.d/umurmur:

# $OpenBSD: umurmurd.rc,v 1.2 2011/07/08 09:09:43 dcoppa Exp $
# 2015-06-06, Turner:
# Jails the umurmurd deamon on boot, copies the daemon binary and
# libraries in each time.
# An adversary can still tamper with the logfiles, but everything
# else is transient.

daemon="chroot $chroot $original_daemon"

build_chroot() {
  # Locations of binaries and libraries.
  mkdir -p "$chroot/usr/local/bin"
  mkdir -p "$chroot/usr/lib"
  mkdir -p "$chroot/usr/local/lib"
  mkdir -p "$chroot/usr/libexec"

  # Copy in the binary.
  cp "$original_daemon" "$chroot/usr/local/bin/"

  # Copy in shared libraries
  cp "/usr/lib/" "$chroot/usr/lib/"
  cp "/usr/lib/" "$chroot/usr/lib/"
  cp "/usr/local/lib/" "$chroot/usr/local/lib/"
  cp "/usr/local/lib/" "$chroot/usr/local/lib/"
  cp "/usr/lib/" "$chroot/usr/lib/"
  cp "/usr/libexec/" "$chroot/usr/libexec/"

  # Setup /etc and copy in config.
  mkdir -p "$chroot/etc/umurmur"
  cp "/etc/umurmur/umurmur.conf" "$chroot/etc/umurmur/umurmur.conf"
  cp "/etc/umurmur/certificate.crt" "$chroot/etc/umurmur/certificate.conf"
  cp "/etc/umurmur/private_key.key" "$chroot/etc/umurmur/private_key.key"

  # Setup the linker hints.
  mkdir -p "$chroot/var/run/"
  cp "/var/run/" "$chroot/var/run/"
  cp "/usr/libexec/" "$chroot/usr/libexec/"

  # Copy the pwd.db password database in. This is less-than-ideal.
  cp "/etc/pwd.db" "$chroot/etc/"
  grep "$group" "/etc/group" > "$chroot/etc/group"

  # Setup /dev
  mkdir "$chroot/dev"
  mknod -m 644 "$chroot/dev/urandom" c 1 9
  mknod -m 644 "$chroot/dev/null" c 1 3

destroy_chroot() {
  if [ "$chroot" ]
    rm -rf "$chroot"

case "$1" in
# Standard rc.d "stuff" here.
. /etc/rc.d/rc.subr


rc_cmd $1

So there we go! When /etc/rc.d/umurmurd start is called, the chroot is setup, and umurmurd started in there. When you kill it, the chroot jail is emptied.

There are some limitations. For one, any private key (in the default, it's private_key.key) can be compromised by an attacker who can compromise umurmurd, and this can be used to impersonate the server long after the compromise. Secondly, if you do specify a log file in umurmur.conf, and you setup the relevant directory for logging to, it will be trashed when you stop the daemon. This is a real problem if you're trying to workout what happened during a compromise.

Finally, if umurmur is updated, and the required libraries do change, "ldd /usr/local/bin/umurmurd" will list the new shared objects.

Known Issues

This does not currently stop the umurmur daemon on stop. I'm not entirely sure why, but the work around is to stop the service using /etc/rc.d/umurmurd stop, then find it using ps A | grep umurmur and kill -15 it.

Sunday, 31 May 2015

Refactoring & Reliability

We rely on so many systems that their reliability is becoming more and more important.

Pay bonuses are determined based on the output of performance review systems; research grants handed out based on researcher tracking systems, and entire institutions may put their faith for visa enforcement in yet more systems.

The failure of these systems can lead to distress, financial loss, the closure of the organisations, or even prosecution. Clearly, we want these systems to have a low failure rate; be they design flaws or implementation defects.

Unfortunately for the developers of the aforementioned systems, the all have a common (serious) problem: the business rules around them are often in flux. Therefore, the systems must have the dual property of flexibility and reliability. Very often, these are in contradiction to one another.

Reliability requires requirements, specification, design, test suites, design and code review, change control, monitoring, and many other processes to prevent, detect, and recover from failures in the system. Each step in the process is designed as a filter to deal with certain kinds of failures. Without them, these failures can start creeping into a production system. These filters also reduce the agility of a team; reducing their capability to respond to new opportunities and changing business rules.

On the other hand, the flexibility demanded by the team's environment is often attained through the use of traditional object-orientated design. This is typically achieved by writing to specific design patterns. If a system is not already in a state that is considered to be "good design," a team will apply refactorings.

Refactorings are small, semantics-preserving changes to the source of a system, with the goal of migrating towards a better design. This sounds perfect. Any analysis and testing which took place prior to the refactoring should still be valid! [1].
However, even though the semantics of the source are preserved (although, humans do occasionally make mistakes!), other observable properties of the program are not preserved. Any formal argument that was made regarding the correctness, or time, space or power requirements may not be valid after the refactoring.

Not only does the refactoring undermine any previous formal argument, it can often make it more difficult to construct a new argument for the new program. This is because many of the refactoring techniques given introduce additional indirection, duplicate loops, or use dynamically allocated objects. These are surprisingly difficult to deal with in a formal argument. So much so that many safety-critical environments simply do not support them, for example, SPARKAda. In many common standards aimed at safety critical systems, they are likewise banned.

I am not arguing against refactoring. I think it's a great tool to have in one's toolbox. I also think that like any other tool, it needs to be used carefully and with prior thought. I'd also shy away from the idea that just because something's important, it is critical. With a suitable development process, a development team can remain agile whilst still reducing the risk of a serious failure to an acceptable level.

In the end, it's exactly that -- the balance of risks. If a team is not responsive, they may miss out on significant opportunities.To mitigate this risk, teams introduce flexibility into their code through refactoring. To mitigate the risk of these refactorings causing a serious failure [2], the team should employ other mitigations, for example, unit and integration testing, design and code review, static analysis, and so on. Ideally, to maintain the team's agility, they should be as automated and integrated into their standard development practice as possible. Each team and project is different, so they would need to assess which processes best mitigate the risks, whilst maintaining that flexibility and agility.

[1] Fowler states that for refactorings to be "safe", you should have (as a minimum) comprehensive unit tests.
[2] Assuming that the "original" system wasn't going to cause a serious failure regardless.

Saturday, 11 April 2015

All White Labour?

With the up and coming general election, we've been receiving election material.

When my other half mentioned that the leaflet we received from the Labour candidate for York Central looked a little bit "overly white" (paraphrasing), I  decided to run the numbers.

The York Unitary Authority's demographics, from the 2011 census show that we have 94.3% "white" people [1].

We sat down and counted the faces on the leaflet, excluding the candidate themselves. We came to a count of 14 faces, all of which were white.

The chance of picking 14 people at random from a population which is 94.3% white and getting 14 white people is 43.97%. That means that the chance of getting at least one non-white person on the leaflet would've been 56.03%.

Obviously, this is quite close to a toss-up, but bear in mind that these people aren't usually selected for at random. All sorts of biases go into selecting people for photo shoots, from who turns out, to who interacts with the candidate, who the photographer chooses to take photographs of and who is selecting which photos from the shoots end up on the page and their biases towards what will "look good," and what is expected to promote an image of diversity.

Anyways, I don't want to say one way or the other about what this does or does not mean, I just want the data to be available for people.


[1] : 2011 Census: KS201EW Ethnic group, local authorities in England and Wales (Excel sheet 335Kb) (*&newoffset=0&pageSize=25&edition=tcm%3A77-286262) Accessed: 2015-04-11

Monday, 6 April 2015

OpenBSD time_t Upgrade

Last night I foolishly undertook the upgrade from 5.4 to 5.5, without properly reading the documentation. My login shell is zsh, which meant that, when the upgrade was complete, I couldn't login to my system.

I'd get to the login prompt, enter my credentials, see the motd, and be kicked out as zsh crashed, due to the change from 32-bit time_t to 64-bit time_t change. I'd also taken the security precaution of locking the root account.

If fixed this as follows:

  1. Reboot into single-user mode (boot -s at the boot> prompt)
  2. Mounted my filesystems (they all needed fsck running on them, before mount -a)
  3. Changed my login shell to /bin/sh (chsh -s /bin/sh <<username>>)
  4. Rebooted.
After that, it was a simple question of logging in and doing the usual; update my PKG_PATH to point at a 5.4 mirror, and running "pkg_add -u" to upgrade all my affected packages.

I then continued on to upgrade my system to OpenBSD 5.5.

A quick warning: This is one particular failure mode arising from not reading the docs. It may get you back into your system, but it's unlikely to fix all your woes if you don't read the docs.

So, the moral of the story is: ALWAYS READ THE DOCS.