An introduction to rune
03 Jul 2017What is symbolic execution?
Symbolic execution implies assumption of abstract symbolic values for variables to determine flow behaviour of the program.
This concept was put forth in a paper titled “Symbolic Execution and Program testing” by James C. King. Rather than using the data objects provided by the programming language for storing values, we use arbitrary symbolic values for computation. As can be noted, normal computation is a special case of symbolic execution wherein we provide concrete inputs for the values.
Current state of the art
Symbolic execution saw a rise in interest and research due to increase in CPU speed and the ability to execute things parallely. It is one of the very few ideas which has made it to the industry and practice, breaking out of the academic bubble.
Symbolic emulators try to reason about execution paths in a program, generating path equations and collecting constraints on symbolic variables assuming them to be abstract values. We have both source-based and binary-based engines being worked upon right now. Although both are equally interesting, binary analysis is probably more challenging.
-
angr: angr is a python-based binary analysis framework written by students of UCSB. It combines both static and dynamic symbolic execution hence making it applicable for a lot of tasks. angr is made up of multiple submodules such as cle(binary loader), vex(code lifter), SimuVex(code simulation engine) and multiple others. It exposes a simple API for usage and is widely used in CTF contests.
-
Triton: It is another binary analysis framework by the folks at Quarkslab. Some of the features it provides are a taint engine, AST representation of the program and SMT simplification passes. It also has python bindings for scripting simple tools.
-
Manticore: This is a contribution by Trail of Bits to the open source community. Manticore is designed to be extremely simple to use and understand. It focuses on hooking onto instructions and state analysis and instrumentation. It also provides automated input generation and crash discovery. And yet again, it provides a Python API for analysis. It is supposed to be more intuitive to use than other tools. It does not have any kind of intermediate representation and lays emphasis on “staying close to machine abstractions”.
EDIT: Some of the others I managed to miss are BAP, OpenREIL, BitBlaze, PANDA and S2E. The list above is not meant to be exhaustive at all.
rune
For radare2 to become a swiss-knife in a reverse engineer’s toolkit, it is missing a symbolic emulation engine. That’s where rune comes in!
The complete architecture and API design is attributed to sushant94. Although in a very young stage, rune promises a flexible, modular structure and strives to extensible for security research. It is a symbolic emulator (engine) aimed at analysing parts of binaries for a better understanding of their execution semantics. rune can be used as a library for writing automated analysis and for manual debugging through an interactive console.
rune has several dependencies:
-
radare2(r2pipe): rune depends directly on radare2 to provide it binary specific information. Although it is currently restricted to getting architecture and platform information, I will be looking to extend it further to setup communication for retrieving relocation information, section boundaries and for loading static data while analysis. It should ideally act as a complete binary loader. Apart from this, radare2 also performs the task of converting low-level assembly instructions into ESIL. ESIL stands for Evaluable Strings Intermediate Language, which is a Forth like representation for better understanding CPU opcode semantics.
-
esil-rs: The ESIL string generated by r2 for each instruction is then converted into tokens after parsing which are then to be consumed by rune for analysis. esil-rs describes a parser and a lexer for ESIL. VM support is also planned and I will try to work on it as time progresses.
-
libsmt-rs: Every symbolic execution engine requires a solver interface and rune derives it’s from libsmt-rs. This library is designed to work with any solver which uses the SMTLibv2 format for checking satisfiability of assertions provided to it. Since we will be generating path equations for different code paths during the course of binary analysis, to solve for the constraints on the variables declared symbolic we need to generate a set of assertions and check what inputs make them satisfiable. We model our entire memory as a bitvector and currently only support 64 bit operations on registers and memory. Extending this is also on the RSoC todo-list.
-
radeco-lib: [EXPERIMENTAL!] The intermediate SSA form generated by radeco is useful for implementing multiple optimizations on the generated graph. It allows for things such as Dead Code Elimination, better Data-flow analysis and multiple others. For guided / targeted symbolic execution this particular representation is beneficial since we can then symbolically execute over the generated SSA rather than the entire CFG of the program since we already know branch decisions at multiple points. Complete support with radeco-lib is a major project altogether but laying foundations in the early stages is vital.
-
rados: [??] Every symbolic execution engine at certain stage of maturity will require an operating system and CPU model to back it up. This is especially useful when considering system calls, file IO, memory management and emulating overall environment behavior. rados design is currently under process and I would be happy to collect suggestions for the same. rados aims to be a very significant part of my project which I will be looking to start implementing soon!
Overall architecture
rune can be considered to be made up of multiple modules:
-
Stream: It acts a continuous stream of instructions for the Engine to emulate over. Stream uses the r2 API for extracting instructions at a particular address. As discussed earlier, it then uses the parsed tokens to feed the Engine.
-
Context: It contains the entire state of the program under execution. Context is constantly updated as the Engine operates on the instructions. Currently it contains a basic memory model, a register file, solver instance and the instruction pointer of the program. Initial context information/state can be set through RInitialState and be saved for reusability in analysis.
-
Console (runec): The console is meant to be a frontend command line interface for the tool. It can be used for analysis of bits and pieces of programs. This is what I have been working on during my initial stages of the project to make the console intuitive and easy to use for beginners.
-
Explorer: Explorer is how you guide the Engine to register next operation to be executed. We can set decisions for what to when the Engine registers a branch, or what job to be executed next. The Explorer can be directly controlled by through Console based commands.
-
Engine: This is the real workhorse of the entire emulator. The Engine provides capabilities for emulating over ESIL tokens and taking care of the effects due to those operations. The Engine works on the principle of RuneControl which is basically a way to drive it to perform certain actions. Formalizing how the Engine will behave is another part of my project.
Usage
rune is written in Rust(duh!). You can use the rsoc
branch for testing out my
progress. It currently works with the fix
branch of libsmt-rs(my fork) due to unmerged
build-fixes.
Why not have a demo?!
What does the future hold?
The future for rune does look promising! Apart from the ones discussed before, below are some of things I plan to work on in upcoming months:
-
Setting up a test infrastructure: It is important that while software development, we test as we go ahead. Right now, we have very few test cases setup for the project. One important task would be to have unit and integration tests for all modules described above.
-
Multithreaded path exploration: Path exploration is an embarassingly parallel problem. BFS and DFS styled explorers surely would benefit a lot from concurrent exploration. Also, rust!
-
Support for arbitrary bit operations: This seems to be a tough ask. Although I have some ideas, they seem pretty crude right now. Let’s see how it goes :)
That’s all from my side. I would like to thank all r2 devs to giving me this oppotunity to work on the project. I would love to hear from everyone about what they think about some of the ideas I have proposed. Hope you had a fun read. I will post another update as soon as I finish implementing another milestone.
Till then, get a pro r2 license! ;)