Alphanumeric Shellcode Generators
Dec 28, 2014 00:00 · 679 words · 4 minutes read
A common and important class of attack on computer systems is the code injection attack. This attack has two phases: a) injection of code (a.k.a. the shellcode), and b) execution of the injected code. Typically, code is injected in placeholders for data. So, while the target program expects data, the attacker instead sends code (disguised as data). The attacker then redirects the program execution to the injected code. To do this, the attacker can exploit program vulnerabilities such as buffer overflows.
The ASCII ranges 0x30−0x39 (0-9), 0x41−0x5a (A-Z) and 0x61−0x7a (a-z) form the alphanumeric character set. Shellcodes typically consist of bytes that are not alphanumeric. To counter shellcode injection, we can inspect each byte of the incoming data and discard any byte that is not an alphanumeric character. Such filtering does not provide adequate protection, as it is feasible to construct shellcodes that consistent of only alphanumeric bytes. However, constructing alphanumeric shellcodes by hand is a non-trivial and tedious task. Rix [2] developed a tool to automate the conversion of non-alphanumeric shellcode into alphanumeric shellcode for the x86 architecture. His tool encodes the non-alphanumeric bytes into alphanumeric bytes and further embeds instructions within the output shellcode, whose purpose is to decode (or recover) the original bytes at runtime. Rix uses the XOR instruction (which is alphanumeric for many combinations of operands on x86) to recover the non-alphanumeric bytes of the shellcode. The XOR instructions use specific hardcoded constants for each non-alphanumeric byte that is encoded. The most important drawback of Rix’s approach is that every non-alphanumeric byte of the original shellcode requires separate instructions to be embedded in the modified shellcode. This increases the size of the modified shellcode, which typically is over 4 times the size of the original shellcode.
Jan Wever introduced the looped decoding approach as an alternative to the sequential decoding used by Rix. The encoding scheme used by Wever modifies both alphanumeric and non-alphanumeric bytes of the shellcode. The decoding logic is implemented in the form of a loop, whose size is independent of the size of the encoded shellcode. Using a fixed size decoder helps to reduce the size of the modified shellcode, which is important due to the constraints placed on the shellcode.
The exploit shellcodes typically spawn a shell, copy a file (like
passwd
), export, and so on. Such shellcodes have alphanumeric characters like filenames (e.g. “/bin/sh”
or “/etc/passwd”) and ports (e.g. 8080) in them, to name a few. Over and above, there are some instructions that are partly or completely alphanumeric. So, if we only patch the non-alphanumeric bytes in the shellcode, then it results in a more compact alphanumeric shellcode encoding. Using a looped decoder, as opposed to a sequential decoder, also helps to reduce the size of the final alphanumeric shellcode. In this paper we propose two new encoding schemes: Non-Alpha Touch and Alpha Freedom. The main idea behind the proposed schemes is to patch only the non-alphanumeric bytes (Rix’s idea), but using a looped decoder (Jan Wever’s idea). The performance of our schemes depends on the number of non-alphanumeric bytes present in the original shell- code. We demonstrate that our schemes yield more compact encodings than Jan Wever’s Encoder, for many shellcodes.
In the Non-Alpha Touch scheme, a fixed alphanumeric byte (called the alpha mark ) is inserted before every non-alphanumeric byte in the original shellcode. Each non-alphanumeric byte is replaced with two corresponding alphanumeric bytes, which represent the encoded form of the non-alphanumeric byte. At runtime, the decoding loop uses the alpha mark to determine the portions of the shellcode which need to be decoded. This scheme uses three bytes to encode each non-alphanumeric byte in the original shellcode.
In the Alpha Freedom scheme, we add a tweak to the encoding scheme, which allows the decoding loop to determine the need for decoding without the alpha mark . This helps further reduce the number of encoding bytes to two for every non-alphanumeric byte in the original shellcode. This is done by constricting the range of allowed alphanumeric values that can be used in the encoded shellcode.