Inversing: Writing and compiling a program in C

Hello there!

Today I want to bring a short post where I will show you how to write a program in C and compile it with GCC on Windows. This post will be useful, because we will need to compile the sources that I will bring on future technical posts. My goal is not to teach C programming, there is a lot of content about it on the web. Of course that I will help you along the posts with references about the code sources.

I recommend reading the previous topcis:

The GCC

Since I will not bring the executable here instead just the C code for you to compile on your own environment. In this way you can understand better how C code turns into Assembly code. Like a said in the previous topics, it's important to know programming in reverse engineering even if somebody tells you that isn't. There is no ultimate truth, so I'm open to any comment, tips or criticisms.

Well, let's do it! First we have to download the GCC, I'm using the MinGW to install it. There is a lot of C compilers out there, I pick the MinGW for convenience.

https://sourceforge.net/projects/mingw/files/

Download and install it at "C:\mingw" to be easy to access. Skip any checks to install components, just finish the installation. We will install the GCC via console on the next step. Once you have installed it, open the CMD or PowerShell goes to the mingw folder.

cd c:\mingw\bin
mingw-get.exe install mingw32-gcc-bin

We will only install the bin component, because we only want to compile our code. After the download process by the MinGW we are good to go, to write and compile our program.

To use the GCC directly, I mean through any folder in console you have to add the path of GCC to the Environment Path property.

GCC compiles our source with some initialization code that is default for every compilation, as I could see so far it starts to write the assembly from given address (0x401460), this is for the standard compilation setting. Let's see it in the next example. I wrote this simple code in C to test our GCC whether it's ok or not:

#include <stdio.h>

int main(){
   
    printf("Hello There. VerseInversing!\n");
   
    return 0;
}

Copy the code below on your favorite editor, save it like "sample.c" on C:/. Now open the console and run the command:

cd C:\
gcc sample.c -o sample

If everything it's ok you should see a sample.exe created on C:\. Just type it on console and execute it.

./sample.exe

Cool uh ? haha

Well, now we have to see how the assembly was generated. I was using the OllyDbg, but since the x64dbg has becoming very popular I'm giving it a try. It's a nice debug tool updated very often, have many features embed in it and many other good things. Download at:

http://x64dbg.com/

Program loading (Very simple approach)

Before we get our hands dirty, let's see basically how the programming execution works. So first of all I assuming that you read the previous topics. So, to execute a program (binary file) the windows needs to load this file into memory so it takes the file and simply put it into the memory. Then each file type is interpreted in different ways when loaded into memory. The executable file have the header with key information that let the windows load it and execute it properly.

So the executable file always have the PE (Portable executable) header, without it the OS just can't load it. In the PE Header we have vital information that will help our analysis. Is this same information that let the debugger loads it and show us all the information that will help us on the debugging process.

In this PE Header we have many information, like the EntryPoint. EntryPoint like it's name, is the offset where the execution must to begin. This offset belong to the (code, text) section where all the code of our program is in. Again, everything on the file is hex data, but when you put this data on code or text section the OS reads it as instructions to CPU execute. All the rest is hex data too, but it is interpreted in a different way. As I said all depends on how you want to interpret information.

So as I said the executable mainly uses the OS to run, so it will need the libraries on the OS to run it's code. Unless the coder uses the ordinal number to call the API, it's easy to see which api the program uses by looking at the .idata section. IDATA stands for "Import Data" that is the imports the executable do to use on the execution.

On windows each DLL library has specific functionality so you can basically use some program like CFF Explorer to look at the imports of the PE and google each DLL to see the specifics of each one. In our case it uses the KERNEL32.dll and msvcrt.dll. KERNEL32.dll it's pretty basic with core functions, it's very very common. msvcrt.dll it's the library of C on Windows, so as we are using C it make sense to importing it.

Anyway this is subject to another post, but I just doing a little intro on it. Let's continue our debugging.

Let's debug

After you have downloaded and installed it. Let's open our fresh compiled program. Well, it starts little different from others, not at all.. but a little.
It started at ntdll.dll module, but it's already set the breakpoint on the entry point of the executable (EntryPoint information is in the PE Header).

As I said before the program imports some DLLs to uses it's functionality. This DLLs run with our executable so each one of these DLLs and also our executable is a module running. In this case the ntdll.dll was imported by KERNEL32.dll because the the DLL itself uses the ntdll.dll that does some interface with the kernel.

Let's continue. In the x64dbg goes to the Breakpoints tab and look at the breakpoints:

Shortcuts used in this post:

F9: Run the program to the end if it doesn't find any breakpoint ahead.
F8: Step by step or Line by line.
F7: Step into. If there is any CALL instruction you are going inside the CALL. If you F8 then you jump to the next line and skip the CALL.

Ignore the TLS Callback for now, we will not need it in this topic. To do this you go to Options=>Preferences=>[Tab]Events and uncheck the "TLS Callbacks*". I will bring this in the future posts. You can uncheck it or you can simply ignore it skipping with F9.

As you can see the address 0x4012E0 holds our entry point, so lets play until it stops on that address. Press F9 until you reach the address. This is the entry point on programs compiled with GCC, so our program starts after all the initialization code. My tip is to step through it to familiarize with the debugger.

But to make it shorter the call for our program block is on 0x401280 with a CALL instruction to the address 0x401460 (this is generally where the GCC starts to write our program assembly).

So let's do a little analysis. First it set the stack frame (see Assembly Basics) then it enlarges the stack frame by doing an AND followed by a SUB, then it calls a sample.sub_401970 at 0x401469 it's only check for some value at address 0x407028. If you go to the Memory tab you can see that (this information came from PE Header) the address 0x407000 is of the section .bss (uninitialized data), anyway doesn't do anything crucial and we know that, because we write it.

After this step, it move the address 0x405064 to the stack directly it doesn't uses any register to keep this value, it just MOV to the stack and call the PUTS API from the C library of the Windows. Now, why it uses puts if we write printf ? And why it calling from sample.puts ?

First question, if you see the difference from puts to printf you see why it uses the puts. Reference to the puts:

http://www.cprogramming.com/fod/puts.html

As you can see, the puts receives a pointer to a string then it put a newline character at the end of the string and that was exactly what we did in our code, but with printf. So the GCC just saw it and make it better.

Second question, if you press F7 you will step into the CALL and will see that inside the call it's just a Table JUMP to the original API. No secrets.

That's it. So we could see what the compiler can do with code, it work on it to make it simpler and "faster" in it's own way. So coding and see how compiler handles our code it's good to a better understating of assembly in practice.

I coded a simple program in C that have many ways to bypass, so the goal is to the reach the ultimate function which prints "You reach the secret stuff!!". I recommend that you do not see what it does and try to understand what it is doing and how to bypass it through x64dbg (or any other debugger). I will do a post explaining how to do it and how to patch it. Consider as an exercise. (The program is buggy, I will show why on the next post)

https://github.com/bernardopadua/blog-inversing/blob/master/write-compile-c/main.c

Any doubts, comments, tips, criticism just tell me. We are all here to learn with each other. Hope that's can be useful to anyone. I will bring some other analysis here very soon. Thanks! Bye!

Inversing

Saturday, July 8, 2017

Writing and compiling a program in C

The GCC

Program loading (Very simple approach)

Let's debug

Shortcuts used in this post:

No comments:

Post a Comment

Windows Objects