Software Engineering is a way to conceive of both the problem and the solution.”
- Friedrich L. Bauer at the NATO Science Committee in 1967
Security Engineering today primarily is about security problems and then about solutions. I think that in order to solve problems Security Engineering has to become a crucial counterpart to modern Software Engineering, which builds “sound Software Engineering principles” in order to “obtain economically software that is reliable and works efficiently” (F.L. Bauer).
One of these “sound Software Engineering” principles is to verify an implementation. It seems few software engineers do this. I see electro technicians using lots of logical reasoning for circuit analysis testing. It’s strange that Computer Scientists fail to do so for software, isn’t it? While this is primarily our mathematical domain.
In the following I put some ideas and motivation together, to inspire change. None of this is perfect – it’s all work in progress. So feel free to catch the topics. They are free ;).
For software products there is a “trade-off between getting a solution or system to market versus a perfectly designed and bug free product” . In the metrics of a product’s Software Development Lifecycle (SDLC) this is referred to as the concept of technical debt already. This dept inherently creates security risks, which need to be assessed from within the SDLC. In order to estimate the consequence of this security dept a short look at the Secunia Yearly Report from 2012  can be used to reveal an archetype:
The Secunia Yearly Report 2011 shows a correlation between CVEs and security risk development. It can be seen, that the risks of security austerity are rising.
The amount of Common Vulnerability Entries (CVEs) the software industry’s top 20 vendors are facing is on a generally rising trend, both in risk and numbers. The history curves (from 2006 to 2011) show a continual and exponential rise. Security engineers already label this uncontrolled state of product security with the term of security austerity. The rises in risk that major software vendors expose their customers to can be further characterised with the sources of the Open Source Vulnerability DataBase (OSVDB) vendor dictionary: Adobe’s Acrobat Reader is a major statistical influence, which can be estimated by its 224 OSVDB IDs . It’s a Common Of The Shelf (COTS) product, with billions of installations. This concludes to an estimation of the archetype. Excessive security dept to its customers exists within COTS products. Therefore my initial conclusion that few software engineers verify their implementations stand supported.
 http://securitymetrics.org/content/attach/Essays/2012-03-05_-_Software_Security_Austerity.pdf quote from section 1, abstract
 http://secunia.com/?action=fetch&filename=secunia_yearly_report_2011.pdf from page 9 of the industry report 3 http://www.osvdb.org/vendor/5011-adobe-systems-incorporated/1 OSVDB vendor dictionary for Adobe Systems
 http://www.osvdb.org/vendor/5011-adobe-systems-incorporated/1 OSVDB vendor dictionary for Adobe Systems
“I definitely believe that cryptography is becoming less important. In effect, even the most secure computer systems in the most isolated locations have been penetrated over the last couple of years by a series of APTs and other advanced attacks [...] We should rethink how we protect ourselves. Traditionally we have thought about two lines of defense. The first was to prevent the insertion of the APT with antivirus and other defenses. The second was to detect the activity of the APT once it’s there. But recent history has shown us that the APT can survive both of these defenses and operate for several years.”
- Adi Shamir at the Cryptographers’ Panel session at the RSA Conference 2013
Security austerity affects cryptography – we cannot isolate cryptography from its implementations. At a time we speak about the NSA’s cryptanalytic capabilities using threat models, this is a very negative side effect. Prism has shown, that Nation States cannot rely on the integrity of their computer systems any longer. At least there is an urban legend from Kreml using typewriters again ;).
Just an idea…
Personally I’m a fan of distcc because I like to compile stuff early and often. It speeds up compile time in large projects, if the libraries are present at each node in the distcc cluster. What is the counter part, like an distdb – a distributed debugger?
Background: what does this have to do with it?
Few programmers today understand the relationship between code and assembly. “I let my code run in the debugger” – well, do you? Or do you (jit) compile a binary and instrument it in a debugger, and map it back to code? No compiler is perfect (still, before you report a compiler bug be sure what you are doing). The point is: many programmers think their code is running on the machines, while it is not. They limit their perspective, which leads them to create testing scenarios which do not help the situation.
I think Software Testing has to include the bottom-up perspective: from bare-metal to high-level. I firmly believe that every limited perspective creates failure, and not just in security.
But isn’t bare-metal too verbose?
I’m also very interested in LLDB, because the LLVM disassembler promises easier parsing and expression handling of disassembly on LLVM IL. This can be useful for (assistive) plugins. Maybe even for scenarios where Blackbox testing is performed.
- I’m not sure whether the mass of LLVM bytecode is more practical than CISC assembly in these scenarios. It also remains to be seen how LLVM’s disassembly works for MSR and SIMD instructions, which I see in code for Digital Signal Processing software (from a Software Defined Radio context, e.g. GNU Radio’s Vector Oriented Library Kernel (VOLK)). Leaving these specific usecases behind, it’s harder to understand LLVM bytecode than x86 – as a human / reverse engineer.
REIL  may be better suited. Or Vine IL… or a custom IL.
 http://www.zynamics.com/binnavi/manual/html/reil_language.htm – REIL is an Intermediate Language for automated evaluation of assembly languages like ARM, x86 or x86-64
Now let’s see: distributed debugging?! A mad idea? Useless for developers?
The idea of a distributed debugger shapes for a while now – and I made some efforts here and there to push Theorem guided software testing into this. “Theorem guided software testing” is a term I came up with accidentally, when I tried to describe what Automata Extraction (Model Checkers (MCs)) does for a debugger. E.g. Spin does A_x. – Or what problem Domain Models  (checked with Satisfiability Modulo Theory Solvers e.g.) could do for a debugger.
However I’m not so much SMT-only any more these days. And Spin’s A_x works on source – which is usually present for Software Testers doing an audit. – Which answers the last question: it is useful for developers, who want to rely on source code only. The fact that Model Checkers are not heavily used in software engineering is a problem.
The main reason to use MC or SMT in a debugger boils down to augmentation: the debugger is a tool, which can get better if it’s aware of the software it instruments. A debugger works on a compiled binary (with symbols), not on code. Is it a mad idea to inspect what’s really executing, and to measure runtime behaviour, augmented by a debugger? MC and SMT  can provide boundaries for checks and measurements, which a debugger can perform. These can be dynamic or static – because a modern debugger isn’t just a Dynamic Binary Instrumentation frontend. It’s a tool for problem specific analysis. For me this answers the second question – it’s not a mad idea.
SMT and MC can need a lot of computing power – which means you need to scale vertically across n nodes. – It’s a complex task, though. And yes: this answers the first question: distributed debugging.
Other reasons why you may want to distribute debugging are cryptologic tests. E.g. how well does a certain function seed the Random Number Generators (RNGs) at certain embedded devices. Or how well does this custom cryptologic algorithm scale over n runs on n different machines – a question which becomes interesting if hardware is getting involved. Like a CPU with AES specific extensions – or a GPU with vector processing optimisations. Or for a compiler, which tries to optimise for speed by losing accuracy. The motto for a distributed compiler is: run early, run often – and collect the traces.
Maybe dynamic runtime behaviour has to be inspected to call a cryptologic implementation sound?
 http://goedel.cs.uiowa.edu/ – Combination Methods in Automated Reasoning are implemented into modern SMT solvers like Microsoft Z3 or CVC3.
 http://people.csail.mit.edu/vganesh/summerschool/ – Proceedings from the MIT SMT summit
 http://www.dagstuhl.de/en/program/calendar/semhp/?semnr=13411 – Proceedings from the Dagstuhl Castle Program Analysis seminar
Run early, run often – and collect the traces. All hail the SDLC.
Workflow of a Domain-specific debugger
The Client populates the debug release to the debugging service (Pandemonium Service in this picture) and establishes a channel for remote-debugging. This way the debug binary gets remotely instrumented, and processing the information resulting from runtime data-tracking becomes an issue for the debugger’s inner workings. The workflow is to remain manually assisted, because in opposite to existing research implementations we want to augment the detection of security vulnerabilities.
 http://www.persistencelabs.com/s/acsac_2012.pdf – the research paper “Augmenting Vulnerability Analysis of Binary Code” arguments, that manually assisted vulnerability discovery has significant advantages.
 http://bitblaze.cs.berkeley.edu/temu.html – BitBlaze has shown the efficiency of solver driven vulnerability analysis. However it focuses on automated reasoning and semantic analysis.
Distribution of a Domain-specific debugger
My earliest approaches were to utilise MOSIX instead of using any RPC / Message Bus system. I had experiences with IBM WebSphere and CORBA ;). I quickly learned that there have been significant advances in this field. Also Solr works well to index debug traces. But as people used to say… Non Sequitur…
Still… building a computation topology is a challenging task. I have some experience with distributed compilers and high availability databases with shared transaction models. Still… a concept is work in progress here. And I don’t like to blog about my freestyle tree structures because drawing these takes so much time. And Tikz, LaTeX or HTML math in a blog isn’t exactly mature technology ;).
 http://www.mosix.cs.huji.ac.il/ – MOSIX can be used to build computation grids
Ergonomics of a Domain-specific debugger
I don’t see much reason to keep debugging purely textual. IDA Pro has a graph based debugger with a cross platform canvas, so has BinNavi. Latter already has an API and a database, but it would need many improvements to export the necessary functions.
VizSec has some inspirational submissions on graphical reports.
I have looked at mxGraph, because I think a web-fronted could work well, but I’m not sure here. If multiple users are present the Db stuff could get extremely complex, unless I’m mistaken and Master-Master replications work reliably with two nodes already. I’m not so much into functional programming with NoSQL databases – maybe this is an option.
Runtime environment for the instrumented release?
The debugger instruments the binary remotely, a client can map code (if present) to the section.
Remotely it is possible to use VMs, which can get snapshot’ed and suspended. This way you can persist certain scenarios locally, and correlate them with the client. Technologies I have in mind are: LVM snapshots with bare volatile data acquisition (e.g. with volatility) or Qemu’s qcow2 with LibVirt. – Very much depending on the kind of release and its dependencies. Maybe there are better ways.
Debuggers are about to change into software inspection tools. While this short essay did not go into detail on how problem domains can be modelled with MCs and SMT solvers, specific models have to be created. This is stuff papers can be written about.
So currently I’m looking at QtCrypto (Qt5 QCA), which seems to be a nice first candidate for some models.