Wednesday 29 July 2015

(N)ASM101 - 0x03 - Improving strlen()

Previously we've looked at how Win32 API functions are called and how to create our own function which computes the length of a string and stores the result in a register. Some people have pointed out to me that my construction of strlen() is very long-winded. The reason I did this is that the implementation I presented uses simpler ASM instructions and I didn't want to overwhelm readers who are new to this field. On the other hand, now is a good time to introduce these improvements.

x86 Instructions


When it comes to repetitious data movement and comparison, the Intel instruction set provides us with native instructions which faciliate these operations. These conventient instructions will be presented later in this section but first let's cover a few new ASM instructions which will be useful in constructing the improved version of strlen().


NOT


The NOT instruction performs a bitwise NOT of a register or a memory location and stores it in that same location. A logical NOT essentially inverts input bits: NOT 1 = 0 and NOT 0 = 1. Inverting the bits of an integer value is called computing the Ones' Complement of an integer. The following are 2 examples of such:


  +-------------------------------------------------------------------+----------+----------+
  |   +/-  |        |        |        |       |       |       |       | 1s' Comp | Unsigned |
  |--------+--------+--------+--------+-------+-------+-------+-------+----------+----------|
  |    0   |   0    |   0    |   0    |   0   |   0   |   0   |   0   |    0     |    0     |
  |--------+--------+--------+--------+-------+-------+-------+-------+----------+----------|
  |    0   |   1    |   1    |   1    |   1   |   1   |   0   |   1   |   125    |   125    |
  +--------+--------+--------+--------+-------+-------+-------+-------+----------+----------|
  |    1   |   0    |   0    |   0    |   0   |   0   |   1   |   0   |  -125    |   130    |
  |--------+--------+--------+--------+-------+-------+-------+-------+----------+----------|
  |    1   |   1    |   1    |   1    |   1   |   1   |   1   |   1   |   -0     |   255    |
  +-------------------------------------------------------------------+----------+----------+
 

This operation is synonymous to XORing with 0FFFFFFFFh, since 0 XOR 1 = 1 and 1 XOR 1 = 0, and to subtracting from 0FFFFFFFFh, where 0FFFFFFFFh is the 1s' complement representation of -0.

An evident issue with this representation is that there are 2 representations of 0. Also, the addition of 2 numbers in their 1s' complement representation is not always equal to the 1s' complement representation of the addition of the 2 numbers in decimal form. This irregularity happens when the result overflows and has to be accounted for with a carry:


       0001 1111     31 +
     − 1111 1000     −7
     -----------   ----
     1 0001 0111     23   <-- Overflow
     + 0000 0001      1   <-- End-around carry
     -----------   ----
       0001 1000     24   <-- Result: 31 + (-7)


These issues have been circumvented by using the Two's complement representation system which will be discussed in the next subsection.



NEG


NEG performs the Two's Complement of the operand's value and overwrites it with the newly computed value. The Two's Complement of a binary number is obtained in the following manner:
  1. Scan from right to left until the 1st "1" is found
  2. Perform Ones' Complement on the bits to the left of the "1"
OR:
  1. Perform Ones' Complement
  2. Add 1
As an example:


  +-------------------------------------------------------------------+----------+----------+
  |  -128  |   64   |   32   |   16   |   8   |   4   |   2   |   1   | 2s' Comp | Unsigned |
  |--------+--------+--------+--------+-------+-------+-------+-------+----------+----------|
  |    0   |   0    |   1    |   0    |   1   |   1   |   0   |   0   |    44    |    44    |
  |--------+--------+--------+--------+-------+-------+-------+-------+----------+----------|
  |    1   |   1    |   0    |   1    |   0   |   1   |   0   |   0   |   -44    |   212    |
  +-------------------------------------------------------------------+----------+----------+


The advantages of this system is that there's only 1 representation of 0 and the addition, subtraction or multiplication operations are identical to those for unsigned binary numbers. An overflow is simply discarded. Let's go through the previous example using the 2s' Complement system:


       0001 1111     31 +
     − 1111 1001     −7
     -----------   ----
     1 0001 1000     24   <-- Discard Overflow
     -----------   ----
       0001 1000     24   <-- Result: 31 + (-7)



REPZ(REPE), REPNZ(REPNE) & REP


The REP instruction and its variations are used to repeat operations up to ECX times. Each time the secondary instruction is performed, ECX is decremented. Repeat while Zero (REPZ) and Repeat while Equal (REPE) are interchangeable and keep executing the input instruction while ECX != 0 and ZF = 1. Repeat while Not Zero (REPNZ) and Repeat while Not Equal (REPNE) execute the input instruction while ECX != 0 and ZF != 1.

We'll look at examples at how these can be combined with SCAS to deal with repetitious comparisons in the next subsection.



SCAS


SCAS comes in 3 flavours: SCASB, SCASW, SCASD, that operate at 1-,2-, or 4-byte granularity respectively. SCAS implicitly compares AL/AX/EAX with data starting at the memory address EDI; EDI is automatically incremented/decremented depending on the Direction Flag (DF).

Consider the following example:

    mov edi, input       ; assume input is a user defined string
    mov ecx, 05h         ; move 0x5 to ecx
    mov al, 041h         ; move 0x41 to al (0x41 = 'A')
    repne scasb          ; repeatedly compares until ECX = 0 or 'A' is reached

Let "input" point to the start of an arbitrary string. The program will compare this string, a byte at a time, with 'A', each time decremeting ECX. The program terminates either when ECX reaches 0 or when an 'A' is found since the latter will set the Zero Flag, whichever comes first. At this point you've probably figured out that the combination of REP and SCAS will be at the center of the solution to improving strlen(), and hence of this tutorial.



The New and Improved strlen()


We now put the newly-learnt techniques to good use:

;nasm -fwin32 strlen_improved.asm
;GoLink /entry _main strlen_improved.obj
;Run under a debugger      

global _main

section .data
    input db "What is the length of this string?",0   ; string to compute length on

section .text
_main:
    xor    ecx, ecx      ; ecx = 0x00000000
    not    ecx           ; initialize ecx to largest value possible (4,294,967,295 in 32-bit)
    xor    al, al        ; al = 0x00
    mov    edi, input    ; edi points to start of string
    mov    ebx, edi      ; store original pointer
    repne  scasb         ; repeatedly compare bytes
    sub    edi, ebx      ; subtract to get length + 1
    dec    edi           ; decrement edi
done:
    int    3             ; debugger interrupt

not ecx
Performs a NOT on 0x00000000, returning 0xFFFFFFFF which is 4,294,967,295 in decimal form, the highest possible 32-bit unsigned integer. This is to avoid REPNE from quitting due to ECX hitting zero before reaching the end of the string.
xor al, al
Zeros out AL which will be used by SCASB to compare bytes from the input string with. Remember that to find the length of the string we need to iterate over each byte until the NULL terminator is encountered
mov ebx, edi
Stores the address of the start of the string. This will be used to calculate the final result.
repne scasb
Roughly translates to : while (ZF != 0 OR ECX != 0) { compare byte with 0x00; increment EDI; decrement ECX}. The Zero Flag is set when AL and the current character that is being compared to are equal, i.e. when 0x00 is encountered.
sub edi, ebx
Subtracts the position of the NULL from the start of the string.
dec edi
Decrements the result to account for NULL.

Generate the executable using NASM and GoLink:

Command Prompt
C:\>nasm -fwin32 strlen_improved.asm C:\>GoLink /entry _main strlen_improved.obj GoLink.Exe Version 1.0.1.0 - Copyright Jeremy Gordon 2002-2014 - JG@JGnet.co.uk Output file: strlen_improved.exe Format: Win32 Size: 1,536 bytes C:\>


Attach a debugger and look at the result saved in EDX.




Further improvements ?


Of course !! Saving the original pointer to the string, subtracting the new value from it, and decrementing it again is not the most efficient way of obtaining the final result.

At the start of the program we've initialized ECX to all 1s, i.e. the maximum unsigned integer, but this can also be interpreted as -1 from a signed-integer point of view. Also, each REPNE iteration decremented this value by 1. At the end we get ECX = - strlen - 2. Why -2? One from the counted string terminator and another from ECX being initialized to -1.

An interesting property of NOT is that given any negative number X, NOT X = |X| - 1. Combining these 2 facts we can calculate the length of the string directly from ECX in the following manner:

not  ecx  ; ecx = - strlen - 2, i.e. not ecx = |- strlen - 2| - 1  = strlen + 2 - 1  =  strlen + 1 
dec  ecx  ; ecx =   strlen + 1, i.e. dec ecx = strlen !!

Adding this new improvement we get :


;nasm -fwin32 strlen_improved_2.asm
;GoLink /entry _main strlen_improved_2.obj
;Run under a debugger      

global _main

section .data
    input db "What is the length of this string?",0   ; string to compute length on

section .text
_main:
    xor    ecx, ecx      ; ecx = 0x00000000
    not    ecx           ; initialize ecx to -1 (signed int)
    xor    al, al        ; al = 0x00
    mov    edi, input    ; edi points to start of string
    repne  scasb         ; repeatedly compare bytes
    not    ecx           ; ecx = strlen + 1
    dec    ecx           ; account for 0x00
done:
    int    3             ; debugger interrupt

Conclusion


Congratulations to those of you who have made it through all the tutorials thus far. As a teaser, in the next post we'll combine today's ASM program with the one presented in the first to construct a finalized, presentable version of strlen() .. バイバイ

Tuesday 14 July 2015

(N)ASM101 - 0x02 - Constructing strlen()

In the previous post we've looked at some of the most commonly used registers, the stack, some x86 instructions, and how to call a higher level Win32 API function. We've combined these to create an ASM program which displays a message box with a custom message and title.

In this post we'll look at some more interesting x86 instructions and construct the strlen() C++ function from scratch.

strlen()


The function expects a pointer to a null-terminated string and returns the length, i.e. the number of characters that make up the string without including the terminating null character. As a trivial example let's consider the string "HELLO" shown below:
As portrayed in the image, the string "HELLO" constitutes of 5 characters and the NULL character, hence passing a pointer to this string to strlen() returns 5.


x86 Instructions


To keep the theory at a minimum, only instructions required to achieve our goal, and some of their variations, will be covered in this section.

MOVe


The MOV instruction is probably the most common ASM instruction. As the name suggests it moves data from one place to another. To be more precise it copies data across rather than move it, since the source value remains intact. The operands can be a register (e.g. EAX), a memory location (e.g. [EAX]; the memory location pointed at by EAX) or an immediate value (e.g. 0FFh). The following are some examples of such:
mov  eax, F003h       ; immediate -> reg ;  EAX  =  0xF003
mov  [eax], F003h     ; immediate -> mem ; [EAX] =  0xF003
mov  ebx, eax         ; reg -> reg       ;  EBX  =   EAX
mov  eax, [ebx]       ; mem -> reg       ;  EAX  =  [EBX]
mov  [ebx], eax       ; reg -> mem       ; [EBX] =   EAX
mov  eax, [eax+ebx*4] ; reg -> reg       ;  EAX  =[EAX+EBX*4]
Notice that data moves from right to left. This is the Intel syntax, and we'll be sticking exclusively to it. For those of you who are interested, the other major syntax is the AT&T syntax and is predominantly used in *nix environments.


ADD/SUB & INC/DEC


Arithmetic operations can be performed on registers and memory locations. We can add two registers with ADD, subtract an immediate value from a memory location with SUB, increment values with INC and decrement values with DEC.
add esp, 14h            ;  esp  =  esp  + 0x14
add ecx, eax            ;  ecx  =  ecx  + eax
sub ecx, eax            ;  ecx  =  ecx  - eax
sub esp, 0Ch            ;  esp  =  esp  - 0xC
inc eax                 ;  eax  =  eax  + 1
dec eax                 ;  eax  =  eax  - 1
inc dword [eax]         ; [eax] = [eax] + 1
add word [eax], 0FFFFh  ; [eax] = [eax] + 0xFFFF
add dword [eax], 0FFFFh ; [eax] = [eax] + 0xFFFF
Notice that when the destination is a memory location, the size of the memory that we want to deal with has to be explicitly specified. Let's see why this is important.

Consider the last 2 of the examples above which at first glance look deceptively equivalent. Say that EAX points to an arbitrary location in memory. Also say that if we read a dword in length starting from this location, we retrieve 0x41424344. This means that if we read a word in length starting from the same location, we get 0x4344. So:
 add dword [eax], 0FFFFh  => dword [eax] = 0x41424344 + 0xFFFF
                          => dword [eax] = 0x41434343 & word [eax] = 0x4343 & CF=0

 add  word [eax], 0FFFFh  => word  [eax] =     0x4344 + 0xFFFF
                          => dword [eax] = 0x41424343 & word [eax] = 0x4343 & CF=1
Notice the differences. For the word-case, the most significant part is untouched since we explicitly told it to consider a word in length and the Carry Flag has been set to 1 as the result is larger than a word.


OR/XOR


These instructions perform bitwise (eXclusive)OR of the operands and place the result in the first operand specified. The following are some self-explanatory examples:
or ebx, ebx            ;  ebx  =  ebx  |  ebx
xor eax, [ecx]         ;  eax  =  eax  ^ [ecx]
or [edx], eax          ; [edx] = [edx] |  eax
xor eax, 0fh           ;  eax  =  eax  ^  0xF
or word [eax], 0FFFFh  ; [eax] = [eax] | 0xFFFF
As in the previous case, when dealing with memory locations, the size has to be explicitly specified. XOR has 2 interesting properties: XORing twice with the same value yields the original value and XORing a value with itself results in zeroes. The latter is used to clear a register whereas the former is sometimes used as a simple shellcode obfuscating technique to evade Antivirus programs.


TEST & CMP


TEST performs a bitwise AND but does not save the result and CMP performs a SUB without saving the result. These operations are used to set the appropriate flags for subsequent operations which may perform different actions depending on these flags.
cmp  eax, ecx              ;  eax   -  ecx
test [ebx], eax            ; [ebx]  &  eax
cmp edx, 0FFh              ;  edx   -  0xFF
test dword [eax], 0ABCDh   ; [eax]  & 0xABCD 
Once again make sure the length is specified when dealing with memory locations.


JZ(JE), JNZ(JNE) & JMP


Jumps are used to control the flow of an ASM program just like "if..then..else" and "switch" for higher-level programming languages. Jump if Zero (JZ) and Jump if Equal (JE) are interchangeable since the Zero Flag is set to 1 when two equal values are compared, and same for Jump if Not Zero (JNZ) and Jump if Not Equal (JNE). JMP is an unconditional jump and transfers the control flow irrelevant of the flag values.
jmp procedure   ; jump to "procedure"
je  procedure   ; jump to "procedure" if ZF=1
jz  procedure   ; jump to "procedure" if ZF=0


INT


INT generates a software interrupt and takes a single hexadecimal value. We will not go through the numerous available interrupts but we talk briefly on the following: 0x21 and 0x3. INT 0x21 calls an MS-DOS API call depending on the values in the registers. INT 0x3 (0xCC) is used by debuggers to set a breakpoint.
int 3 ; execution breaks here if attached to a debugger

A complete Intel Syntax Reference guide can be found here.


Constructing strlen()


We've looked at how strlen() works and familiarized ourselves with more than enough ASM instructions to reconstruct the function from scratch. strlen() may be very simple and straightforward but let us decompose it into smaller, digestible operations. This will help us understand the underlying logic and will make it easier to translate to ASM.

strlen() Algorithm:
  1. Set CTR = 0
  2. Is char at position CTR = 0x00 ?
  3. If Yes GOTO 6
  4. Increment CTR
  5. GOTO 2
  6. DONE: Return CTR
Without further ado:
;nasm -fwin32 strlen.asm
;GoLink /entry _main strlen.obj
;Run under a debugger

global _main

section .data
    input db "What is the length of this string?",0   ; string to compute length on

section .text
_main:
    xor     ecx, ecx          ; clear registers to be used
    xor     ebx, ebx          ; 
    mov     edi, input        ; move pointer to start of input to edi
check_next:
    mov     bl, [edi+ecx]    ; move char to bl
    test    bl, bl           ; check if char = 0x00 (i.e. end of string)
    jz      done             ; if ZF=1 (i.e. bl=0x00), jump to done
    inc     ecx              ; increment ecx (counter)
    jmp     check_next       ; jmp to check_next
done:
    int     3                ; debugger interrupt
xor ???, ???
Zeroes out the registers before use. ECX is used both as a counter and a pointer to the next character while the lower part of EBX holds the character we are comparing to.
mov edi, input
Moves the pointer to the start of the string to EDI, i.e. [EDI] = "W".
check_next
A symbolic label used as a reference by jump instructions.
mov bl, [edi+ecx]
Moves the char pointed to by [EDI+ECX] to the BL register.
test bl, bl
Reminder: TEST performs a bitwise AND without storing the result. The only value that returns zero when ANDed to itself, is zero. So, if BL = 0x0 then BL & BL = 0x0, hence ZF=1. If BL <> 0x0, then BL & BL <> 0x0, hence ZF=0.
jz done
If ZF=1, i.e. the previous result was 0x0, i.e. we hit the end of the string, jump to the "done" label.
inc ecx
Increment ECX which holds a counter to the string length.
jmp check_next
Jump to the "check_next" label to repeat the process with an incremented counter and a pointer to the next character.
int 3
A breakpoint for debuggers. This will give us the opportunity to take a look at the ECX register which, at the point it reaches the interrupt, should contain the length of the string in hexadecimal format.

Generate the executable using NASM and GoLink:

Command Prompt
C:\>nasm -fwin32 strlen.asm C:\>GoLink /entry _main strlen.obj GoLink.Exe Version 1.0.1.0 - Copyright Jeremy Gordon 2002-2014 - JG@JGnet.co.uk Output file: strlen.exe Format: Win32 Size: 1,536 bytes C:\>

As you might have realized by now, we can't just run the executable. We need to attach it to a debugger to be able to view the result in the ECX register. In a future post I might demonstrate how to format a registry value and display it in a message box. It's more involved than it sounds.

The following screenshot shows the status of the registers after the debugger hits the INT instruction:


ECX contains 0x22 which translates to 34, the length of "What is the length of this string?". Play around with it; modify the string and run it step by step to get a better feel for ASM.



Conclusion


A lot was covered in this post. We've looked at how higher-level functions are constructed using basic assembly instructions. In the next post we won't be creating anything new but rather we'll be improving upon what we've covered here .. さようなら