## Correct sorting with Frama-C and some thoughts on Formal Methdos

12 02 2011

A couple of years ago, during my masters on Formal Methods I have been working with automatic provers and I also used Frama-C, this is a tool that allow the user to prove C code directly in the source code, using a special notation in the comments, called ACSL notation.

Frama-C allows you to make two kinds of proofs, security and safety ones. The safety ones are related with arrays index out of bounds access, and so. This kind of proofs are related to the language itself and they are easy to do if you use loop invariants, pre and post conditions.
If you use a high level language, like JAVA you won’t have almost none safety problems.
Because C is too close to machine level code, we can do things that we do not intend (or maybe we do and we use C exactly because it allows this kind of things). For example:

// foo.c file
#include <stdio.h>

int main() {
char *a = "I like you";
char *b = "I hate you";

if(&a < &b) a = *(&a + 1);
else        a = *(&a - 1);

printf("%s\n", a);
}


As you can see, I never used the $b$ variable for nothing, just have declared it. And the result is:

[ulissesaraujocosta@maclisses:c]-$gcc -o foo foo.c [ulissesaraujocosta@maclisses:c]-$ ./foo
I hate you


This lack of security of language C is one of the reasons we need to write safety statements. Of course this kind of things is why C is so fast and powerful, the person in charge is always the programmer. If you are interested in this kind of tricks and want to understand more about this and smashing the stack and so, feel free to read more posts in my blog about this subject.

The other kind of statements (security ones) are related to the functionality of the program and that’s basically where the problem or the effort is, I will talk about this later on. First let’s see the algorithm and the implementation in C.

## Code

The algorithm I use here is just a simple example. I used bubble sort, this is a sort algorithm not very efficient, but it uses none more memory then the needed to store the structure you want to sort.
To get a visual understanding of the algorithm (and to see it inefficiency) check out this youtube video.

This is the implementation of the algorithm:

void swap(int *i, int *j) {
int tmp = *i;
*i = *j;
*j = tmp;
}

void bubbleSort(int *vector, int tam) {
int j, i;
j = i = 0;
for(i=0; i<tam; i++) {
for(j=0; j<tam-i-1; j++) {
g_swap = 0;
if (vector[j] > vector[j+1]) {
swap(&vector[j],&vector[j+1]);
}
}
}
}


## Pre, Post conditions and thinking formally

So, as you can see in the video (or in the code) the algorithm is pretty much simple, we pick the $i$ element cross the array $n$ times and for each element we compare with $i$, this $n$ times.

We have as pre conditions: The size of the $vector$ must be greater than zero, and all the positions in that vector exists, so in Frama-C we use the $valid\_range(vector, i, j)$, where $i$ and $j$ are indexes of the $vector$ to say that all elements exist.

$tam > 0$

$valid\_range(vector,0,tam-1)$

Ans as pos conditions we must ensure that the array is sorted ( I will talk predicate this later).
You may think that this by itself is enough to make a complete proof, but you are wrong. Image that my function clear all the elements in the array and fill the array with $\{1,2,..,tam\}$, our code will be proved and its wrong!

So, we need to say more… First thing that can pop to your head is OK, we will say that we have the same numbers in the beginning and in the end and you write this:
$\forall_a : 0 \leq a < tam : (\exists_b : 0 \leq b < tam : old(vector(b)) \equiv vector(a))$

In fact this is closer (not yet right), imagine that you give as input:
$\{4,7,9,1,0,3,4\}$. If your code returns $\{0,1,3,4,7,9\}$ (we miss the repeated $4$) the code will be proved.
So, the solution if to make a $Permut$ predicate and prove for the multi set.
So, this are the post conditions:

$sorted(vector,0,tam-1)$

$Permut\{Old,Here\}(vector,0,tam-1);$

Frama-C is so cool because for example at the pos condition if we want to refer to the state in the beginning (before call the function) we use $Old$ and if we want to refer to the moment after the call we heave the $Here$ keyword, remember we are at the post condition, so this wil be executed in the end (so $Here$ means the end of the function call).

## Predicates

So, here is the $Sorted$ predicate. Predicates receive a state $L$ and the parameters (just like a function) and they return bool values (true or false). Inside we use regular ACSL notation. Here I define that for an array to be sorted each element must be less or equal to the next one.

/*@ predicate Sorted{L}(int a[], integer l, integer h) =
@   \forall integer i; l <= i < h ==> a[i] <= a[i+1];
@*/


The $Permut$ is defined inductively, so we receive two states $L1$ and $L2$ and the array $a$ and the range where we want to permute.
We write multiple rules for the permutation, reflection, symmetry, transitivity and finally the most important one, the $Swap$. So basically here we say that a permutation is a set of successive swaps.

/*@ inductive Permut{L1,L2}(int a[], integer l, integer h) {
@  case Permut_refl{L}:
@   \forall int a[], integer l, h; Permut{L,L}(a, l, h) ;
@  case Permut_sym{L1,L2}:
@    \forall int a[], integer l, h;
@      Permut{L1,L2}(a, l, h) ==> Permut{L2,L1}(a, l, h) ;
@  case Permut_trans{L1,L2,L3}:
@    \forall int a[], integer l, h;
@      Permut{L1,L2}(a, l, h) && Permut{L2,L3}(a, l, h) ==>
@        Permut{L1,L3}(a, l, h) ;
@  case Permut_swap{L1,L2}:
@    \forall int a[], integer l, h, i, j;
@       l <= i <= h && l <= j <= h && Swap{L1,L2}(a, i, j) ==>
@     Permut{L1,L2}(a, l, h) ;
@ }
@
@ predicate Swap{L1,L2}(int a[], integer i, integer j) =
@      \at(a[i],L1) == \at(a[j],L2)
@   && \at(a[j],L1) == \at(a[i],L2)
@   && \forall integer k; k != i && k != j ==> \at(a[k],L1) == \at(a[k],L2);
@*/


So, as you can see the bubble sort function itself have 18 lines of code, and in the end with the annotations for the proof we end with 90 lines, but we proved it!

## Thoughts

My main point here is to show the thinking we need to have if we want to prove code in general. Pick what language you want, this is the easiest way you will have to prove software written in C. Sometimes if your functions are too complex you may need to prove it manually. The problem is not on the Frama-C side, Frama-C only generates the proof obligations to feed to automatic provers, like Yices, CVC3, Simplify, Z3, Alt-Ergo and so.

My point here is to show the cost of proving software. Proving software, specially if the language is too low level (like C – you need to care about a lot more things) is hard work and is not easy to a programmer without theoretical knowledge.
On the other side, you end up with a piece of software that is proved. Of course this proof is always requirements oriented, ny that I mean: if the requirements are wrong and the program is not doing what you expect the proof is along with that.
I do not stand to proof of all the code on the planet, but the proper utilization of FM (formal methods) tools for critical software.

I steel been using Frama-C since I learned it in 2009, nowadays I use it for small critical functions (because I want, I’m not encouraged to do so) and I have to say that the use of FM in the industry is far. As I told you Frama-C is the easiest automatic proof tool you will find at least that I know.

Talking with Marcelo Sousa about the use of FM in industry, we came to the conclusion that the people that are making this kind of tools and have the FM knowledge don’t make companies. I think if more brilliant people like John Launchbury make companies, definitely FM will be more used.

## Source code

Here is all the code together if you want to test it:

// #include <stdio.h>

/*@ predicate Sorted{L}(int a[], integer l, integer h) =
@   \forall integer i; l <= i < h ==> a[i] <= a[i+1];
@
@ predicate Swap{L1,L2}(int a[], integer i, integer j) =
@      \at(a[i],L1) == \at(a[j],L2)
@   && \at(a[j],L1) == \at(a[i],L2)
@   && \forall integer k; k != i && k != j ==> \at(a[k],L1) == \at(a[k],L2);
@*/

/*@ inductive Permut{L1,L2}(int a[], integer l, integer h) {
@  case Permut_refl{L}:
@   \forall int a[], integer l, h; Permut{L,L}(a, l, h) ;
@  case Permut_sym{L1,L2}:
@    \forall int a[], integer l, h;
@      Permut{L1,L2}(a, l, h) ==> Permut{L2,L1}(a, l, h) ;
@  case Permut_trans{L1,L2,L3}:
@    \forall int a[], integer l, h;
@      Permut{L1,L2}(a, l, h) && Permut{L2,L3}(a, l, h) ==>
@        Permut{L1,L3}(a, l, h) ;
@  case Permut_swap{L1,L2}:
@    \forall int a[], integer l, h, i, j;
@       l <= i <= h && l <= j <= h && Swap{L1,L2}(a, i, j) ==>
@     Permut{L1,L2}(a, l, h) ;
@ }
@*/

/*@ requires \valid(i) && \valid(j);
@ //assigns *i, *j; //BUG 0000080: Assertion failed in jc_interp_misc.ml
@ ensures \at(*i,Old) == \at(*j,Here) && \at(*j,Old) == \at(*i,Here);
@*/
void swap(int *i, int *j) {
int tmp = *i;
*i = *j;
*j = tmp;
}

/*@ requires tam > 0;
@ requires \valid_range(vector,0,tam-1);
@ ensures Sorted{Here}(vector, 0, tam-1);
@ ensures Permut{Old,Here}(vector,0,tam-1);
@*/
void bubbleSort(int *vector, int tam) {
int j, i;
j = i = 0;
//@ ghost int g_swap = 0;

/*@ loop invariant 0 <= i < tam;
@ loop invariant 0 <= g_swap <= 1;
//last i+1 elements of sequence are sorted
@ loop invariant Sorted{Here}(vector,tam-i-1,tam-1);
//and are all greater or equal to the other elements of the sequence.
@ loop invariant 0 < i < tam ==> \forall int a, b; 0 <= b <= tam-i-1 <= a < tam ==> vector[a] >= vector[b];
@ loop invariant 0 < i < tam ==> Permut{Pre,Here}(vector,0,tam-1);
@ loop variant tam-i;
@*/
for(i=0; i<tam; i++) {
//@ ghost g_swap = 0;
/*@ loop invariant 0 <= j < tam-i;
@ loop invariant 0 <= g_swap <= 1;
//The jth+1 element of sequence is greater or equal to the first j+1 elements of sequence.
@ loop invariant 0 < j < tam-i ==> \forall int a; 0 <= a <= j ==> vector[a] <= vector[j+1];
@ loop invariant 0 < j < tam-i ==> (g_swap == 1) ==> Permut{Pre,Here}(vector,0,tam-1);
@ loop variant tam-i-j-1;
@*/
for(j=0; j<tam-i-1; j++) {
g_swap = 0;
if (vector[j] > vector[j+1]) {
//@ ghost g_swap = 1;
swap(&vector[j],&vector[j+1]);
}
}
}
}

/*@ requires \true;
@ ensures \result == 0;
@*/
int main(int argc, char *argv[]) {
int i;
int v[9] = {8,5,2,6,9,3,0,4,1};

bubbleSort(v,9);

//     for(i=0; i<9; i++)
//         printf("v[%d]=%d\n",i,v[i]);

return 0;
}


If you are interested in the presentation me and pedro gave at our University, here it is:

## Splint the static C code checker

3 05 2009

Splint is this great tool to statically analyze C code. Splint is an evolution of Lint. Lint makes analysis of C code and verifies unused declarations, type inconsistencies, use before definitions, unreachable code, ignored return values, execution path with no return and infinite loops. However, Lint was not a sufficiently powerful tool. Splint was created as a descendant of Lint allowing annotations in C code, so, it do more checks than its predecessor. In this post I will present some verifications that Splint does, and show the philosophy and how to use it. Splint allows annotations on functions, variables, parameters and types.

## Undefined Variables

Splint detects instances where the value of a variable is used before it is defined. Anyway, many static analyzers can also do this, it is a very basic check. However, thanks to the splint annotations supports, they can be used to describe what storage must be defined and what storage must be undefined at interface points, this means that all storage reachable from a global variable, parameter to a function, or function return value is defined before and after the function call.

A special case of undefined variables is undefined function parameters. Sometimes we do a certain C function that return values, or change parameters values. The out annotation denotes a pointer to storage that may be undefined. Splint have a great storage model, so does not report an error when a pointer to allocated but undefined storage is passed as an out parameter. If in the body of a function an outparameter is allocated but not bounded to a value Splint reports an error. You can see out as an parameter that will be defined inside a function. The opposite happens to the in annotation, this can be used to denote a parameter that must be completely defined.
Here is a small example of it:

extern void setVal (/*@out@*/ int *x);
extern int getVal (/*@in@*/ int *x);
extern int mysteryVal(int *x);

int dumbfunc(/*@out@*/ int *x, int i) {
if (i > 3)
11         return *x;
else if (i > 1)
13         return getVal (x);
else if (i == 0)
15         return mysteryVal (x);
else {
18         setVal (x);
19         return *x;
}
}


And here it is the output of Splint:

> splint usedef.c
usedef.c:11: Value *x used before definition
usedef.c:13: Passed storage x not completely defined
(*x is undefined): getVal (x)
usedef.c:15: Passed storage x not completely defined
(*x is undefined): mysteryVal (x)
Finished checking --- 3 code warnings


As you can see, no error is reported for line 18, because the incomplete defined storage x is passed to a function that will define it, so we can return *x with no problems.

## Types

Strong type checking often reveals programming errors. Splint can check primitive C types more strictly and flexibly than typical compilers.

C, does not make types verification. Because for C all basic non numerical types are just int’s (Enum, Pointer, Char). Splint supports stricter checking of built-in C types. The char and enum types can be checked as distinct types, and the different numeric types can be type-checked strictly. Also the primitive char type can be type-checked as a distinct type. If char is used as a distinct type, common errors involving assigning ints to chars are detected.

In Splint, you can always turn off some default verifications, and Splint will assume the C compiler strategy to verify the code. If you run Splint with +charint is on, char types are indistinguishable from ints. So you can make cast’s from int to char or from cahr to int. But never forgot that this can lead to errors, as you can imagine if you do (char) 1000 you will not get an expected result because the characters only go up to 256.

Splint reports errors in the use of Enums, if a value that is not an enumerator member is assigned to the enum type or if an enum type is used as an operand to an arithmetic operator. Anyway you can turn this off, activating the enumint flag and then you can use enums and ints types interchangeably.

## Memory Management

About half the bugs in typical C programs can be attributed to memory management problems.

Any C programmer who has already made a reasonable size program, has been confronted with unexpected errors related with memory management problems. This happens basically because C is a low level language so, does not have any system like Java’s garbage collector. All the memory management have to be done by the programmer, so errors may append. Some only appear sporadically, and some may only be apparent when compiled on a different platform. Splint is able to detect many memory management errors at compile time, like: using storage that may have been deallocated, memory leaks or returning a pointer to stack-allocated storage. This is possible because Splint makes a memory model when is working on the code, anyway I will not talk about this subject.

Splint can see and report deallocating storage when there are other live references to the same storage and failing to deallocate storage before the last reference to it is lost. The solution here is simple, Splint makes an obligation to release storage and attach this obligation to the reference to which the storage is assigned.

For references not shared Splint uses the only annotation to indicate that a reference is the only pointer to the object it points to, here we also use the null annotation to say that the output of malloc could be NULL:

/* @only@ */ /* @null@ */ void * malloc ( size_t size );


Here is a simple example of C annotated code with memory leaks and use after free variables:

1 extern /*@only@*/ int *glob;
/*@only@*/ int *f (/*@only@*/ int *x, int *y, int *z) /*@globals glob;@*/ {
8    int *m = (int *)
9    malloc (sizeof (int));
11   glob = y;   //Memory leak
12   free (x);
13   *m = *x;   //Use after free
14    return z;   //Memory leak detected
}


Here is the Splint output for the file above:

> splint only.c
only.c:11: Only storage glob (type int *) not released before assignment: glob = y
only.c:1: Storage glob becomes only
only.c:11: Implicitly temp storage y assigned to only: glob = y
only.c:13: Dereference of possibly null pointer m: *m
only.c:8: Storage m may become null
only.c:13: Variable x used after being released
only.c:12: Storage x released
only.c:14: Implicitly temp storage z returned as only: z
only.c:14: Fresh storage m not released before return
only.c:9: Fresh storage m allocated


Splint errors and warnings are very human readable, so you just have to read them to understand.
The first warning says that variable glob was not released (we say, in line 1, that it is only) before the assignment. In line 13 we have a dereference of possibly null pointer, because in line 12 we have done free to x variable, an now we want to use it’s value. As you can see, is very easy to understand the output of Splint.

Splint can also make verifications in stack based references. A memory error occurs if a pointer into stack is live after the function returns. Splint detects errors involving stack references exported from a function through return values or assignments to references reachable from global variables or actual parameters.
No annotations are needed to detect stack reference errors. It is clear from declarations if storage is allocated on the function stack.

Here is an example of stack-allocated storage:

int *glob;

int *f (int **x) {
int sa[2] = { 0, 1 };
int loc = 3;
9   glob = &loc;
10  *x = &sa[0];
12  return &loc;
}


And here is the Splint output:

> splint stack.c
stack.c:12: Stack-allocated storage &loc reachable from return value: &loc
stack.c:12: Stack-allocated storage *x reachable from parameter x
stack.c:10: Storage *x becomes stack
stack.c:12: Stack-allocated storage glob reachable from global glob
stack.c:9: Storage glob becomes stack


## Control Flow

In order to avoid certain errors, Splint have to understand the control flow of the program, so Splint do some checks related to control flow. Many of these checks are possible because of the extra information that is known in annotations. Without this additional information Splint assumes that all functions return and execution continues normally.

noreturn annotation is used to denote a function that never returns.

extern /* @noreturn@ */ void fatalerror ( char *s);


We also have maynoreturn and alwaysreturns annotations, but Splint must assume that a function returns normally when checking the code and doesn’t verify if a function really returns.

To describe non-returning functions the noreturnwhentrue and noreturnwhenfalse mean that a function never returns if the first argument is true or false.

/* @noreturnwhenfalse@ */ void assert (/* @sef@ */ bool /* @alt int@ */ pred );


The sef annotation denotes a parameter as side effect free, and the alt int indicate that it may be either a Boolean or an integer.

### Undefined Behavior

The order which side effect take place in C is not entirely defined by the code.

A sequence point is some point in the C code where it is guaranteed that all side effects of previous evaluations have been performed.
An example of sequence points is:

• a function call (after the arguments have been evaluated)
• at the end of a if, while, for or do statement
• a &&, || and ?

Here is a simple C program that have undefined behavior.

extern int glob;
extern int mystery (void);
extern int modglob (void) /*@globals glob@*/ /*@modifies glob@*/;

int f (int x, int y[]) {
11    int i = x++ * x;
13    y[i] = i++;
14    i += modglob() * glob;
15    i += mystery() * glob;
16    return i;
}


And here is the Splint output:

> splint order.c
order.c:11: Expression has undefined behavior (value of right operand modified by left operand): x++ * x
order.c:13: Expression has undefined behavior (left operand uses i, modified by right operand): y[i] = i++
order.c:14: Expression has undefined behavior (value of right operand modified by left operand): modglob() * glob
order.c:15: Expression has undefined behavior (unconstrained function mystery used in left operand may set global variable glob used in right operand): mystery() * glob


Regard control flow, Splint has more options to check C code. I will not talk about all of them here.

## Buffer Sizes

Buffer overflow errors are a particularly dangerous type of bug in C, they are responsible for half of all security attacks. This is happens because C does not perform runtime bound checking (for performance reasons), and so attackers can exploit program bugs to, for example, gain full access to a machine.

Splint models blocks of memory using two properties:
maxSet, maxSet(b) denotes the highest address beyond b that can be safely used as lvalue, for instance:

char buffer[MAXSIZE] we have maxSet(buffer) = MAXSIZE - 1


and maxRead, maxRead(b) denotes the highest index of a buffer that can be safely used as rvalue.

When a buffer is accessed as an lvalue, Splint generates a precondition constraint involving the maxSet property and when a buffer is accessed as an rvalue, Splint generates a precondition constraint involving the maxRead property.

We can use this two properties in buffer sizes annotations, this annotations uses the clause requires and ensures just like JML, VDM and Frama-C. When a function with requires clause is called, the call site must be checked to satisfy the constraints implied by requires.

Here is an example of its use:

void /* @alt char * @*/ strcpy
(/* @out@ */ /* @returned@ */ char *s1 , char *s2)
/* @modifies * s1@ */
/* @requires maxSet (s1) >= maxRead (s2) @*/


This is a possible annotation for strcpy standard library C function. We say that s1 is the return value of the function (returned), and that the pointer to s1 is the only one (only). We also say that this function modifies s1 and we specify the pre and post conditions.

Splint can also do bound checking. This is more complex than other checks done by Splint, so, memory bound warnings contain extensive information about the unresolved constraint as you can see in this example:

int buf [10];
buf [10] = 3;

setChar .c :5:4: Likely out -of - bounds store : buf [10]
Unable to resolve constraint : requires 9 >= 10
needed to satisfy precondition : requires maxSet ( buf @ setChar .c :5:4) >= 10


## The ultimate test: wu-ftpd

wu-ftpd version 2.5.0 was about 20.000 lines of code and took less than four seconds to check all of wu-ftpd on a 1.2-GHz Athlon machine. Splint detected the known flaws as well as finding some previously unknown flaws (!)

One of the problems of these automatic static analysis tools is that it can produce false problems.
Running Splint on wu-ftpd without adding annotations produced 166 warnings for potential out-of-bounds writes. After adding 66 annotations, it produced 101 warnings: 25 of these indicated real problems and 76 were false.

## Pros and Cons

Pros:

• Lightweight static analysis detects software vulnerabilities
• Splint definitely improves code quality
• Suitable for real programs…

Cons:

• …although it produces more warning messages that lead to confusion
• It won’t eliminate all security risks
• Hasn’t been developed since 2007, they need new volunteers

## Conclusions

No tool will eliminate all security risks, lightweight static analysis tools (Splint) play an important role in identifying security vulnerabilities.

Here is the presentation about that subject:

## Outroduction

All the examples provide here is from the Splint manual.

## Capture and dissect network traffic

1 04 2009

Currently I am doing research at the University of Minho in the group of distributed systems, with duration of one year. My job is to find a way to identify specific links between a user and a distributed system. The general idea is to draw a map of services in a distributed system. This post only refers to the first milestone.

The proposal was to make such a system using Snort.

## Snort

Snort is a Network intrusion detection system, that means with Snort you can detect malicious activity in your network. We can detect many types of network attacks. We can identify DoS, DDoS attacks, port scans, cracking attempts, and much more.

Snort can operate in two different ways. We can set up Snort to run in passive mode, putting it to listen in promiscuous mode. That is, because Ethernet network switches send traffic to all computers connected to itself, we get traffic addressed to other machines on the network. To do this we only need to connect to the network and turn Snort on in our machine, no one knows that we are recording every traffic (including traffic destined for other computers).

Snort may also run in active mode. This “active” is not able to modify the data channel, but to be able to be installed in a network, a router for example and reap more information than in passive mode. Thus it makes sense to use the capacity of rules that Snort supports, to filter the traffic that it read.

To do this, Snort capture all packets that pass the network and interprets each. As the rules we have defined Snort tries to find these patterns in each packet, or each set of packets and take certain actions for each of them.

For example, if a large number of TCP requests reach a particular host, to a large number of ports in a short space of time we probably are the target of a port scan. NIDS like Snort know how to find these patterns and alerting the network administrator.

## Objective

Our aim was to use Snort to capture all traffic into passive mode.

root@pig:# snort -u snort -g snort -D -d -l /var/log/snort -c /etcsnort/snort.debian.conf -i eth0

We are saving the logs in binary (tcpdump format), for that I use the “-d -l /dir/” flags. I prefer to save all the packets into binary because is more easier to parse, than the structure of files and directories that Snort creates by default.

I started by trying to use some language that advised me to try to do the parsing of the file created by snort. Initially started to use python, but only find a tcpdump parser and could not get more than one file translated in tcpdump to hexadecimal.
After that I tried to use Haskell and I was amazed!

House is a Haskell Operative System done by The Programatica Project.

This is a system than can serve as a platform for exploring various ideas relating to low-level and system-level programming in a high-level functional language.

And indeed helped me a lot in doing my job. This project have already done a lot of parsers for network packets. It implements the Ethernet, IPv4, IPv6, TCP, UDP, ICMP, ARP and I think is all.

The libpcap (tcpdump parser) is already implemented in Haskell too, so is very simple to parse a complete packet:

getPacket :: [Word8] -> InPacket
getPacket bytes =  toInPack $listArray (0,Prelude.length bytes-1)$ bytes

-- Ethernet | IP | TCP | X
getPacketTCP :: [Word8] -> Maybe (NE.Packet (NI4.Packet (NT.Packet InPacket)))


## More catamorphisms

Imagine we have the following data type:

data NList a where
Leaf  :: a -> NList a
NNode :: a -> [NList a] -> NList a


$out$, $rec$, and $cata$ became:

out (Leaf a) = Left a
out (NNode a l) = Right (a,l)


Using the previous definitions of $(-|-)$ and $(><)$

rec f = id -|- (id >< map f)

cata g = g . rec (cata g) . out


Imagging that $g$ has type:

g :: Either a (a,[[a]]) -> [a]


And the graphic for this cata became:



## Conclusion

I’ve talked about cata’s without any formalism, the idea was to explain to someone who didn’t know.

I will talk more about catamorphisms and how to calculate programs with them.
In the future I will like to talk about anamorphisms too. And Later on I will talk about point-free over non recursive functions.