## Permutations in pure functional ActionScript

2 04 2011

I have been working with ActionScript3 and Flex to do apps for Android. I have never used this languages before, so in the beginning I was with my hart open to understand this languages, specially ActionScript3, the only thing I knew about ActionScript is that is used a lot for flash animations and that was it.

At a first glance ActionScript3 seemed to me to be a standard object oriented language, but after I grasp a little bit more and I found that the version 3 has a lot of functional flavor in it.
I started to find more and more examples of it: definition of high-order functions, map, filter and so…

I have to say that I deeply continue to prefer the Haskell notation with concern to high order and the 2 functions, but the point of ActionScript3 was never be totally functional, but incorporate some features of it.

So, outside of work I remember the beautiful definition of permutations in Haskell and I start try to implement the same algorithm in ActionScript3.

perms xs = [x:ps | x <- xs, ps <- perms (xs\\[x])]

For the non Haskell programmers I will explain: We receive the list $xs$ and here use the list comprehension notation, basically we can translate a list comprehension in the form:
$[f~x | x \leftarrow list]$ as a $map$ function this way $map~f~list$. So, that’s basically the first part of our definition. The second one $[x:ps | x \leftarrow list, ps \leftarrow perms(list_2)]$ is the cartesian product of the two lists.
Finally the definition of the function $xs//[x]$ is simply the exclusion of the element $x$ from the list $xs$.

So, I first came out with this solution:

public static function perms(xs:Array):Array {
if(xs.length == 0) {
return [[]];
}
return xs.map(
function(x:*,index:uint,array:Array):* {
var ret:Array = new Array();
var p:Array = perms( exclude(xs,x) );
for(var j:uint=0 ; p[j] ; j++) {
var ps:Array = p[j] as Array;
ps.unshift(x);
ret.push(ps);
}
return ret;
}
);
}

public static function exclude(xs:Array, elem:*):Array {
return xs.filter(
function(item:*,index:uint,array:Array):Boolean { return item != elem; }
);
}


I had to define the $exclude$ function, because it does not exist in ActionScript3 and I understand that.
The code is not as pretty as you expect to be in Haskell or other functional language, but is pretty much the same, as you ca see in the following definition, where I only use $map$.

public static function permss(xs:Array):Array {
if(xs.length == 0) {
return [[]];
}
return xs.map(
function(x:*,index:uint,array:Array):* {
var p:Array = permss( exclude(xs,x) );
return p.map(
function(y:*,_:uint,__:Array):* {
var ps:Array = y as Array;
ps.unshift(x);
return ps;
}
);
}
);
}
public static function exclude(xs:Array, elem:*):Array {
return xs.filter(
function(item:*,index:uint,array:Array):Boolean { return item != elem; }
);
}


9 04 2009

If you lost yourself in this post, I advise you to start in catamorphisms, then anamorphisms and then hylomorphisms.

Like I said before (in those posts) when you write an hylomorphism over a particular data type, that means just that the intermediate structure is that data type.

$\xymatrix@!=2.7cm {A \ar@/_{1.8cm}/[dd]^{[|f,h|]} \ar[rr]^{h} \ar[d]^{[(h)]} & & B \ar[d]^{rec_{[(h)]}}\\C \ar[d]^{(|f|)} \ar@/^{1cm}/[rr]^{out} & \cong & D \ar@/^{1cm}/[ll]^{in} \ar[d]^{rec_{(|f|)}}\\E & & F \ar[ll]^{f}\\}$

In fact that data will never be stored into that intermediate type $C$ or $D$. Because we glue the ana and cata together into a single recursive pattern. $A$ and $E$ could be some data type your function need. With this post I will try to show you more hylomorphisms over some different data types to show you the power of this field.

## Leaf Tree’s

The data type that we going to discuss here is the $LTree$. In Haskell we can represent $LTree$ as:

data LTree a = Leaf a | Fork (LTree a, LTree a)


Is just like a binary tree, but the information is just in the leaf’s. Even more: a leaf tree is a tree that only have leaf’s, no information on the nodes. This is an example of a leaf tree:
$\xymatrix@!=1cm {& & \bullet \ar@/^/[ld] \ar@/^/[rd] &\\& \bullet \ar@/^/[ld] \ar@/^/[rd] & & DATA\\DATA & & \bullet \ar@/^/[ld] \ar@/^/[rd]& \\& DATA & & DATA\\}$

To represent all the hylomorphisms over $Ltree$ we draw the following diagram:
$\xymatrix@!=2.7cm {A \ar@/_{1.8cm}/[dd]^{[|f,h|]} \ar[rr]^{h} \ar[d]^{[(h)]} & & B \ar[d]^{rec_{[(h)]}}\\Ltree~a \ar[d]^{(|f|)} \ar@/^{1cm}/[rr]^{out} & \cong & a + Ltree~a \times Ltree~a \ar@/^{1cm}/[ll]^{in} \ar[d]^{rec_{(|f|)}}\\E & & F \ar[ll]^{f}\\}$

The example I’m going to give is making the fibonacci function using a hylomorphism over this data type. If you remember the method I used before, I’m going to start by the anamorphism $[(h)]$. Before that I’m going to specify the strategy to define factorial. I’m going to use the diagram’s again, remember that type $1$ is equivalent to Haskell $( )$:

$\xymatrix@!=2.7cm {Integer \ar@/_{2.2cm}/[dd]^{[|add,fibd|]} \ar[rr]^{fibd} \ar[d]^{[(fibd)]} & & 1 + Integer \times Integer \ar[d]^{id + [(fibd)] \times [(fibd)]}\\Ltree~1 \ar[d]^{(|add|)} \ar@/^{1cm}/[rr]^{out} & \cong & 1 + Ltree~1 \times Ltree~1 \ar@/^{1cm}/[ll]^{in} \ar[d]^{1 + (|add|) \times (|add|)}\\Integer & & \ + Integer \times Integer \ar[ll]^{add}\\}$

As you can see I’m going to use $Ltree~1$ as my intermediate structure, and I’ve already define the names of my gen functions $add$ to the catamorphism and $fibd$ to the anamorphism. The strategy I prefer, is do all the hard work in the anamorphism, so here the gen $fibd$ for the anamorphism is:

fibd n | n < 2     = i1   ()
| otherwise = i2   (n-1,n-2)


This function combined with the anamorphism, going to generate leaf tree’s with $n$ leaf’s, being $n$ the result of that fib.

Then we just have to write the gen $add$ for the catamorphism. This function (combined with the catamorphism) counts the number of leafs that a leaf tree have.

add = either (const 1) plus
where plus = uncurry (+)


The final function, the fibonacci function is the hylomorphism of those two defined before:

fib =  hyloLTree add fibd


Here is all the auxiliary functions you need to run this example:

inLTree = either Leaf Fork

outLTree :: LTree a -> Either a (LTree a,LTree a)
outLTree (Leaf a)     = i1   a
outLTree (Fork (t1,t2)) = i2    (t1,t2)

cataLTree a = a . (recLTree (cataLTree a)) . outLTree

anaLTree f = inLTree . (recLTree (anaLTree f) ) . f

hyloLTree a c = cataLTree a . anaLTree c

baseLTree g f = g -|- (f >< f)

recLTree f = baseLTree id f


## Lists

The lists that I’m going to talk here, are the Haskell lists, wired into the compiler, but is a definition exist, it will be:

data [a] = [ ] | a : [a]


So, our diagram to represent the hylomorphism over this data type is:
$\xymatrix@!=2.7cm {A \ar@/_{1.8cm}/[dd]^{[|f,h|]} \ar[rr]^{h} \ar[d]^{[(h)]} & & B \ar[d]^{rec_{[(h)]}}\\[C] \ar[d]^{(|f|)} \ar@/^{1cm}/[rr]^{out} & \cong & 1 + C \times [C] \ar@/^{1cm}/[ll]^{in} \ar[d]^{rec_{(|f|)}}\\E & & F \ar[ll]^{f}\\}$

The function I’m going to define as a hylomorphism is the factorial function. So, we know that our domain and co-domain is $Integers$, so now we can make a more specific diagram to represent our solution:

$\xymatrix@!=2.5cm {Integer \ar@/_{2.1cm}/[dd]^{[|mul,nats|]} \ar[rr]^{nats} \ar[d]^{[(nats)]} & & 1 + Integer \times Integer \ar[d]^{1 + [(nats)] \times [(nats)]}\\[Integer] \ar[d]^{(|mul|)} \ar@/^{1cm}/[rr]^{out} & \cong & 1 + Integer \times [Integer] \ar@/^{1cm}/[ll]^{in} \ar[d]^{1 + (|mul|) \times (|mul|)}\\Integer & & 1 + Integer \times Integer \ar[ll]^{mul}\\}$

As you can see I’m going to use $[Integer]$ to represent my intermediate data, and I’ve already define the names of my gen functions $mul$ to the catamorphism and $nats$ to the anamorphism. Another time, that I do all the work with the anamorphism, letting the catamorphism with little things to do (just multiply). I’m start to show you the catamorphism first:

mul = either (const 1) mul'
where mul' = uncurry (*)


As you can see the only thing it does is multiply all the elements of a list, and multiply by 1 when reach the $[]$ empty list.

In the other side, the anamorphism is generating a list of all the elements, starting in $n$ (the element we want to calculate the factorial) until 1.

nats = (id -|- (split succ id)) . outNat


And finally we combine this together with our hylo, that defines the factorial function:

fac = hylo mul nats


Here is all the code you need to run this example:

inl = either (const []) (uncurry (:))

out []    = i1 ()
out (a:x) = i2(a,x)

cata g   = g . rec (cata g) . out

ana h    = inl . (rec (ana h) ) . h

hylo g h = cata g . ana h

rec f    = id -|- id >< f


## Binary Tree’s

Here, I’m going to show you the hanoi problem solved with one hylomorphism, first let’s take a look at the $Btree$ structure:

data BTree a = Empty | Node(a, (BTree a, BTree a))


So, our generic diagram representing one hylomorphism over $BTree$ is:
$\xymatrix@!=2.7cm {A \ar@/_{1.8cm}/[dd]^{[|f,h|]} \ar[rr]^{h} \ar[d]^{[(h)]} & & B \ar[d]^{rec_{[(h)]}}\\BTree~a \ar[d]^{(|f|)} \ar@/^{1cm}/[rr]^{out} & \cong & 1 + a \times~BTree~a \times~BTree~a \ar@/^{1cm}/[ll]^{in} \ar[d]^{rec_{(|f|)}}\\E & & F \ar[ll]^{f}\\}$

There is a well-known inductive solution to the problem given by the pseudocode below. In this solution we make use of the fact that the given problem is symmetrical with respect to all three poles. Thus it is undesirable to name the individual poles. Instead we visualize the poles as being arranged in a circle; the problem is to move the tower of disks from one pole to the next pole in a speciﬁed direction around the circle. The code deﬁnes $H_n.d$ to be a sequence of pairs $(k, d)$ where n is the number of disks, $k$ is a disk number and $d$ are directions. Disks are numbered from $0$ onwards, disk $0$ being the smallest. Directions are boolean values, $true$ representing a clockwise movement and $false$ an anti-clockwise movement. The pair $(k, d)$ means move the disk numbered $k$ from its current position in the direction $d$.

excerpt from R. Backhouse, M. Fokkinga / Information Processing Letters 77 (2001) 71–76

So, here, I will have a diagram like that, $b$ type stands for $Bool$ and $i$ type for $Integer$:
$\xymatrix@!=2.8cm {b \times i \ar@/_{1.8cm}/[dd]^{[|f,h|]} \ar[rr]^{h} \ar[d]^{[(h)]} & & 1 + (b \times i) \times~[b \times i] \times~[b \times i] \ar[d]^{id + id \times~[(h)] \times~[(h)]}\\BTree~(i \times b) \ar[d]^{(|f|)} \ar@/^{1cm}/[rr]^{out} & \cong & 1 + (i \times b) \times~BTree~(i \times b) \times~BTree~(i \times b) \ar@/^{1cm}/[ll]^{in} \ar[d]^{id + id \times~(|f|) \times~(|f|)}\\[i \times b] & & 1 + (i \times b) \times~[i \times b] \times~[a] \ar[ll]^{f}\\}$

I’m going to show all the solution here, because the description of the problem is in this quote, and in the paper:

hanoi = hyloBTree f h

f = either (const []) join
where join(x,(l,r))=l++[x]++r

h(d,0) = Left ()
h(d,n+1) = Right ((n,d),((not d,n),(not d,n)))


And here it is, all the code you need to run this example:

inBTree :: Either () (b,(BTree b,BTree b)) -> BTree b
inBTree = either (const Empty) Node

outBTree :: BTree a -> Either () (a,(BTree a,BTree a))
outBTree Empty              = Left ()
outBTree (Node (a,(t1,t2))) = Right(a,(t1,t2))

baseBTree f g = id -|- (f >< g))

cataBTree g = g . (recBTree (cataBTree g)) . outBTree

anaBTree g = inBTree . (recBTree (anaBTree g) ) . g

hyloBTree h g = cataBTree h . anaBTree g

recBTree f = baseBTree id f


## Outroduction

Maybe in the future I will talk more about that subject.

9 04 2009

If you miss something in this post, I suggest you to start in Catamorphisms and Anamorphisms.

A Hylomorphism is just the composition of one catamorphism and then one anamorphism.
$hylo~f~h~=~cata~f~\circ~ana~h$, replacing that by the proper notation we have: $[|f,h|]~=~(|f|)~\circ~[(h)]$

In this post I will use the structure of a binary tree:

data BTree a = Empty | Node(a, (BTree a, BTree a))


I will use the tuples to don’t have to write uncurry’s. As I will show you, when we say that we are making a hylomorphism on a particular data type $T$, what we are trying to say is that the intermediate structure of our combination of catamorphism and anamorphism is that data type $T$. This is the structure throw our morphism will communicate with each other.

## Anamorphism

So, here I will solve the Quicksort algorithm with a hylomorphism over $BTree$.

The intermediate structure being a $BTree$ doesn’t mean that my function will receive $BTree$. My $qSort$ function works over lists. So the first thing to do, is draw the respective anamorphism from $[a]$ to $BTree~a$:

$\xymatrix@!=4.7cm {[a] \ar[r]^h \ar[d]^{[(h)]} & 1 + a \times~[a] \times~[a] \ar[d]^{id + id \times~[(h)] \times~[(h)]}\\BTree~a & 1 + a \times~BTree~a \times~BTree~a \ar[l]^{in} \\}$

My strategy here is to do all the work in the anamorphism, so, I need a function $h$ with type:
$h : [a] \rightarrow 1 + a \times [a] \times [a]$, or in Haskell $h :: [a] \rightarrow Either () (a, ([a], [a]))$

That function is $qsep$:

qsep :: [a] -> Either () (a, ([a], [a]))
qsep []    = Left ()
qsep (h:t) = Right (h,(s,l))
where (s,l) = part (<h) t

part:: (a -> Bool) -> [a] -> ([a], [a])
part p []                = ([],[])
part p (h:t) | p h       = let (s,l) = part p t in (h:s,l)
| otherwise = let (s,l) = part p t in (s,h:l)


This code is very simple, in $qsep$ I chose a pivotal element (first one), and filter the bigger to one side, and the other ones to the other, just like the algorithm. The function that do all that job is $part$, it process all the list finding the elements that satisfy the condition $p$, to put them in the left side of the tuple, and the others into the right side.

This function by it self don’t do almost anything, it is only a simple part of the algorithm.

## Catamorphism

Next step is to see the diagram for catamorphisms from $BTree~a$ to $[a]$:

$\xymatrix@!=4.7cm {BTree~a \ar[d]^{(|f|)} \ar[r]^{out} & 1 + a \times~BTree~a \times~BTree~a \ar[d]^{id + id \times~(|f|) \times~(|f|)}\\[a] & 1 + a \times~[a] \times~[a] \ar[l]^{f}\\}$

As I said before, the heavy duty is on the side of the anamorphism, so here, the catamorphism will be very very simple. In fact it is.

inord :: Either a (a, ([a], [a])) -> [a]
inord = either (const []) join
where join(x,(l,r))=l++[x]++r


That right! The only thing that the catamorphism do is a inorder passage over the structures $a + a \times [a] \times [a]$, which is very simple, as as shown by the code.

## Hylomorphism

The first thing is to draw the diagram, now for the hylomorphism, the composition of the cata with the ana:

$\xymatrix@!=2.7cm {[a] \ar@/_{1.8cm}/[dd]^{[|f,h|]} \ar[rr]^{h} \ar[d]^{[(h)]} & & 1 + a \times~[a] \times~[a] \ar[d]^{id + id \times~[(h)] \times~[(h)]}\\BTree~a \ar[d]^{(|f|)} \ar@/^{1cm}/[rr]^{out} & \cong & 1 + a \times~BTree~a \times~BTree~a \ar@/^{1cm}/[ll]^{in} \ar[d]^{id + id \times~(|f|) \times~(|f|)}\\[a] & & 1 + a \times~[a] \times~[a] \ar[ll]^{f}\\}$

Once having made the two most important parts of the function (the ana and cata), the hylo is very simple to do. You just have to make a function $hyloBTree$:

hyloBTree h g = cataBTree h . anaBTree g


And our function $qSort$ bacame:

qSort :: Ord a => [a] -> [a]
qSort = hyloBTree inord qsep


And that’s it, now I’m going to show you the all code that you need to put all the things together and working.

inBTree :: Either () (b,(BTree b,BTree b)) -> BTree b
inBTree = either (const Empty) Node

outBTree :: BTree a -> Either () (a,(BTree a,BTree a))
outBTree Empty              = Left ()
outBTree (Node (a,(t1,t2))) = Right(a,(t1,t2))

baseBTree f g = id -|- (f >< g))

cataBTree g = g . (recBTree (cataBTree g)) . outBTree

anaBTree g = inBTree . (recBTree (anaBTree g) ) . g

hyloBTree h g = cataBTree h . anaBTree g

recBTree f = baseBTree id f


## Outroduction

If you need more explanations feel free to contact me.

8 04 2009

First I would like to introduce the notation that I use here. The pointfree notation is good to see a program (functions) data flow and as composition of functions, combination of functions, if you prefer. This style is characterized by not using variables in declaration of functions. Haskell allow us to implement that notation natively. The dual of the pointfree notation is the pointwise one.

A simple example of a function in pointwise style:

f n = (n+2)*10 -- pointwise


The dual in pointfree would be:

f = (*10) . (+2) -- pointfree


## Clarifications

First of all to define a function, for example $f$, i can say:

$\xymatrix {\mathbb{N} \ar[r]^{f} & \mathbb{R}}$, $\xymatrix {\mathbb{R} & \mathbb{N}\ar[l]^{f}}$ or $\xymatrix {\mathbb{N} \ar[d]^{f} \\ \mathbb{R}}$.

I will assume that you are familiarized with infix notation, $either$, and composition $(\circ)$ functions.

## Types

For this post I need to explain the data type we will going to use. In Haskell we define it by:

data Tree a = Node a [Tree a]


Let’s create the same, but more convenient. Consider the following isomorphic type for $Tree$:

data Tree a = Node (a, [Tree a])

We could see $Node$ as a the following function:

Node :: (a, [Tree a]) -> Tree a


So typologically we have $(a, [Tree~a])$. We use $(\times)$ to define that two things occurs in parallel, like tuples do, so we can redefine it: $(a \times~[Tree~a])$

Now we can say that $(Tree~a)$ is isomorphic to $(a \times~[Tree~a])$.
This is something to say that $(Tree~a)$ and $(a \times~[Tree~a])$ keep the same information without any change. We represent that formally as: $(Tree~a) \cong~(a \times~[Tree~a])$.

## Anamorphisms

Let $A$, $B$, $C$, $D$ be Inductive data types (sets) and $in$, $ana$, $rec$ functions.

$\xymatrix@!=10pc{A \ar[r]^{h} \ar[d]^{ana(h)_{Tree}} & B \ar[d]^{rec_{Tree}} \\C & D\ar[l]^{in_{Tree}}\\}$

$ana(h_{Tree})$ is the anamorphism of $h$ if the diagram commute.

We use the notation $rec_{Tree}$ to say that function $rec$ in not generic, but only works for data $Tree$. The same happens with $in$ and $ana$. We will write $ana(h)_{Tree}$ using the composition of $in$, $ana$ and $rec$ functions. That way we are breaking our problem in small ones. So, in the end we will have the following definition for $ana(h)_{Tree}$:

$ana(h)_{Tree} = in_{Tree} \circ rec_{Tree} \circ h$

The function that we want is $ana(h)$, and that function is over $(Tree~a)$ so we have:

ana :: (A -> B) -> A -> Tree c

Type $C$ is $(Tree~c)$. Maybe this isn’t clear yet, let’s start with function $in$

### function in

The function $in_{Tree}$ is responsible to create the isomorphism between $(Tree~a)$ and $(a \times~[Tree~a])$, so the code could be something like this:

inTree :: Tree a -> (a, [Tree a])
inTree    = Node


In Haskell we represent the type $(\times)$ as $(,)$. So, type $D$ is $(a \times~[Tree~a])$. So by now, we already know the following unifications $C \sim Tree~c$ and $D \sim c \times~[Tree~c]$. So now our graphic is:
$\xymatrix@!=10pc{A \ar[r]^{h} \ar[d]^{ana(h)_{Tree}} & B \ar[d]^{rec_{Tree}} \\Tree~c & c \times~[Tree~c]\ar[l]^{in_{Tree}}\\}$

### function $h$

The function $h$ is also known as *gen*, here is where we said the step that pattern do. This is the only function we need to take care, if this function is good, our problem is solved. Now image that our problem is:

Suppose that the pair of positive integers (v, p) denotes the number of red balls (v) and black (p) that is inside a bag, the balls which are taking randomly, successively, until the bag is empty.

This is the point-wise version of the function we want to convert to pointfree using anamorphisms. This function represent as a tree, all possible states of the bag over these experiences.

state :: (Int,Int) -> Tree (Int,Int)
state(0,0) = Node (0,0) []
state(v,0) = Node (v,0) [state(v-1,0)]
state(0,p) = Node (0,p) [state(0,p-1)]
state(v,p) = Node (v,p) [state(v-1,p),state(v,p-1)]


If we want that “latex state$became an anamorphism, we have to say that our type $A$ unify ($\sim$) with $Int \times~Int$, and $Tree~c$ became more restrict, and unify with $Tree (Int \times~Int)$. A consequence of changing the co-domain of $in_{Tree}$ is changing the domain of it to $(Int \times~Int) \times~[Tree (Int \times~Int)]$. We represent $ana(h)$ as $[( h )]$. Now we can be more specific with our graphic: $\xymatrix@!=14pc{Int \times~Int \ar[r]^{h} \ar[d]^{[(h)]_{Tree}} & B \ar[d]^{rec_{Tree}} \\Tree~(Int \times~Int) & (Int \times~Int) \times~[Tree~(Int \times~Int)]\ar[l]^{in_{Tree}}\\}$ ### function rec Here we have to get a function $rec$ that co-domain is $(Int \times~Int) \times~[Tree~(Int \times~Int)]$. Probably the best is to pass the first part of the tuple (part with type $(Int \times~Int)$) and the rest (part with type $[Tree~(Int \times~Int)]$) is just a $map$ of the function $[(h)]_{Tree}$. So, now our graphic is: $\xymatrix@!=14pc{Int \times~Int \ar[r]^{h} \ar[d]^{[(h)]_{Tree}} & (Int \times~Int) \times~[(Int \times~Int)] \ar[d]^{rec_{Tree}} \\Tree~(Int \times~Int) & (Int \times~Int) \times~[Tree~(Int \times~Int)]\ar[l]^{in_{Tree}}\\}$ As you can see, the second part of the co-domain of $h$ is the type of function $map~[(h)]_{Tree}$: $map~[(h)]_{Tree}~:~[(Int \times~Int)] \rightarrow~[Tree(Int \times~Int)]$ So our final graphic became: $\xymatrix@!=14pc{Int \times~Int \ar[r]^{h} \ar[d]^{[(h)]_{Tree}} & (Int \times~Int) \times~[(Int \times~Int)] \ar[d]^{id \times~map~[(h)]_{Tree}} \\Tree~(Int \times~Int) & (Int \times~Int) \times~[Tree~(Int \times~Int)]\ar[l]^{in_{Tree}}\\}$ Now, we just have to define the function $h$ and apply them to our anamorphism of $Tree$. h :: (Int, Int) -> ( (Int, Int), [ (Int, Int) ] ) h(0,0) = ( (0,0), [] ) h(v,0) = ( (v,0), [ (v-1,0) ] ) h(0,p) = ( (0,p) [ (0,p-1) ] ) h(v,p) = ( (v,p), [ (v-1,p), (v,p-1) ] )  And this is it! Now we can say that: $state \equiv~ana_{Tree}$ where $ana(h)_{Tree} = in_{Tree} \circ~id~><~map~ana(h)_{Tree} \circ h$ ## Outroduction Here is all the code you need to run this example in Haskell: module AnamorphismExample where infix 5 >< i1 = Left i2 = Right p1 = fst p2 = snd data Tree a = Node (a, [Tree a]) deriving Show split :: (a -> b) -> (a -> c) -> a -> (b,c) split f g x = (f x, g x) (><) :: (a -> b) -> (c -> d) -> (a,c) -> (b,d) f >< g = split (f . p1) (g . p2) inTree :: (a, [Tree a]) -> Tree a inTree = Node anaTree h = inTree . (id >< map (anaTree h)) . h -- our function h_gen :: (Int, Int) -> ( (Int, Int), [ (Int, Int) ] ) h_gen(0,0) = ( (0,0), [] ) h_gen(v,0) = ( (v,0), [ (v-1,0) ] ) h_gen(0,p) = ( (0,p) , [ (0,p-1) ] ) h_gen(v,p) = ( (v,p), [ (v-1,p), (v,p-1) ] ) state = anaTree h_gen Pass a year since I promised this post. The next will be on hylomorphisms I promise not take too that much. ## Capture and dissect network traffic 1 04 2009 Currently I am doing research at the University of Minho in the group of distributed systems, with duration of one year. My job is to find a way to identify specific links between a user and a distributed system. The general idea is to draw a map of services in a distributed system. This post only refers to the first milestone. The proposal was to make such a system using Snort. ## Snort Snort is a Network intrusion detection system, that means with Snort you can detect malicious activity in your network. We can detect many types of network attacks. We can identify DoS, DDoS attacks, port scans, cracking attempts, and much more. Snort can operate in two different ways. We can set up Snort to run in passive mode, putting it to listen in promiscuous mode. That is, because Ethernet network switches send traffic to all computers connected to itself, we get traffic addressed to other machines on the network. To do this we only need to connect to the network and turn Snort on in our machine, no one knows that we are recording every traffic (including traffic destined for other computers). Snort may also run in active mode. This “active” is not able to modify the data channel, but to be able to be installed in a network, a router for example and reap more information than in passive mode. Thus it makes sense to use the capacity of rules that Snort supports, to filter the traffic that it read. To do this, Snort capture all packets that pass the network and interprets each. As the rules we have defined Snort tries to find these patterns in each packet, or each set of packets and take certain actions for each of them. For example, if a large number of TCP requests reach a particular host, to a large number of ports in a short space of time we probably are the target of a port scan. NIDS like Snort know how to find these patterns and alerting the network administrator. ## Objective Our aim was to use Snort to capture all traffic into passive mode. root@pig:# snort -u snort -g snort -D -d -l /var/log/snort -c /etcsnort/snort.debian.conf -i eth0 We are saving the logs in binary (tcpdump format), for that I use the “-d -l /dir/” flags. I prefer to save all the packets into binary because is more easier to parse, than the structure of files and directories that Snort creates by default. I started by trying to use some language that advised me to try to do the parsing of the file created by snort. Initially started to use python, but only find a tcpdump parser and could not get more than one file translated in tcpdump to hexadecimal. After that I tried to use Haskell and I was amazed! ## Haskell and packet parsing House is a Haskell Operative System done by The Programatica Project. This is a system than can serve as a platform for exploring various ideas relating to low-level and system-level programming in a high-level functional language. And indeed helped me a lot in doing my job. This project have already done a lot of parsers for network packets. It implements the Ethernet, IPv4, IPv6, TCP, UDP, ICMP, ARP and I think is all. The libpcap (tcpdump parser) is already implemented in Haskell too, so is very simple to parse a complete packet: getPacket :: [Word8] -> InPacket getPacket bytes = toInPack$ listArray (0,Prelude.length bytes-1) $bytes -- Ethernet | IP | TCP | X getPacketTCP :: [Word8] -> Maybe (NE.Packet (NI4.Packet (NT.Packet InPacket))) getPacketTCP bytes = doParse$ getPacket bytes :: Maybe (NE.Packet (NI4.Packet (NT.Packet InPacket)))


As you can see is too easy to have a compete structure of a packet parsed with this libraries. The problem is that they don’t have already implemented a application packet parser. So, according to that image:

This is the level of depth we can go with this libraries. What is very good, but not perfect for me :S

My supervisor told me to start searching a new tool to do this job. I was sad because I could not do everything in Haskell. But it is already promised that I will continue this project in Haskell. You can see the git repo here.

I find tshark, a great tool to dissect and analyze data inside tcpdump files.

## The power of tshark

tshark is the terminal based Wireshark, with it we can do everything we do with wireshark.

Show all communications with the IP 192.168.74.242

root@pig:# tshark -R "ip.addr == 192.168.74.242" -r snort.log

...
7750 6079.816123 193.136.19.96 -> 192.168.74.242 SSHv2 Client: Key Exchange Init
7751 6079.816151 192.168.74.242 -> 193.136.19.96 TCP ssh > 51919 [ACK] Seq=37 Ack=825 Win=7424 Len=0 TSV=131877388 TSER=1789588
7752 6079.816528 192.168.74.242 -> 193.136.19.96 SSHv2 Server: Key Exchange Init
7753 6079.817450 193.136.19.96 -> 192.168.74.242 TCP 51919 > ssh [ACK] Seq=825 Ack=741 Win=7264 Len=0 TSV=1789588 TSER=131877389
7754 6079.817649 193.136.19.96 -> 192.168.74.242 SSHv2 Client: Diffie-Hellman GEX Request
7755 6079.820784 192.168.74.242 -> 193.136.19.96 SSHv2 Server: Diffie-Hellman Key Exchange Reply
7756 6079.829495 193.136.19.96 -> 192.168.74.242 SSHv2 Client: Diffie-Hellman GEX Init
7757 6079.857490 192.168.74.242 -> 193.136.19.96 SSHv2 Server: Diffie-Hellman GEX Reply
7758 6079.884000 193.136.19.96 -> 192.168.74.242 SSHv2 Client: New Keys
7759 6079.922576 192.168.74.242 -> 193.136.19.96 TCP ssh > 51919 [ACK] Seq=1613 Ack=1009 Win=8960 Len=0 TSV=131877415 TSER=1789605
...


Show with a triple: (time, code http, http content size), separated by ‘,’ and between quotation marks.

root@pig:# tshark -r snort.log -R http.response -T fields -E header=y -E separator=',' -E quote=d -e frame.time_relative -e http.response.code -e http.content_length

...
"128.341166000","200","165504"
"128.580181000","200","75332"
"128.711618000","200","1202"
"149.575548000","206","1"
"149.719938000","304",
"149.882290000","404","338"
"150.026474000","404","341"
"150.026686000","404","342"
"150.170295000","304",
"150.313576000","304",
"150.456650000","304",
...


Show a tuple of arity 4 with: (time, source ip, destination ip, tcp packet size).

root@pig:# tshark -r snort.log -R "tcp.len>0" -T fields -e frame.time_relative -e ip.src -e ip.dst -e tcp.len

...
551.751252000   193.136.19.96   192.168.74.242  48
551.751377000   192.168.74.242  193.136.19.96   144
551.961545000   193.136.19.96   192.168.74.242  48
551.961715000   192.168.74.242  193.136.19.96   208
552.682260000   193.136.19.96   192.168.74.242  48
552.683955000   192.168.74.242  193.136.19.96   1448
552.683961000   192.168.74.242  193.136.19.96   1448
552.683967000   192.168.74.242  193.136.19.96   512
555.156301000   193.136.19.96   192.168.74.242  48
555.158474000   192.168.74.242  193.136.19.96   1448
555.158481000   192.168.74.242  193.136.19.96   1400
556.021205000   193.136.19.96   192.168.74.242  48
556.021405000   192.168.74.242  193.136.19.96   160
558.874202000   193.136.19.96   192.168.74.242  48
558.876027000   192.168.74.242  193.136.19.96   1448
...


Show with a triple: (source ip, destination ip, port of destination ip).

root@pig:# tshark -r snort.log -Tfields  -e ip.src -e ip.dst -e tcp.dstport

...
192.168.74.242  193.136.19.96   37602
192.168.74.242  193.136.19.96   37602
193.136.19.96   192.168.74.242  22
192.168.74.242  193.136.19.96   37602
193.136.19.96   192.168.74.242  22
193.136.19.96   192.168.74.242  22
192.168.74.242  193.136.19.96   37602
192.168.74.242  193.136.19.96   37602
192.168.74.242  193.136.19.96   37602
193.136.19.96   192.168.74.242  22
193.136.19.96   192.168.74.242  22
193.136.19.96   192.168.74.242  22
193.136.19.96   192.168.74.242  22
192.168.74.242  193.136.19.96   37602
192.168.74.242  193.136.19.96   37602
...


## Statistics

Hierarchy of protocols

root@pig:# tshark -r snort.log -q -z io,phs
frame                                    frames:7780 bytes:1111485
eth                                    frames:7780 bytes:1111485
ip                                   frames:3992 bytes:848025
tcp                                frames:3908 bytes:830990
ssh                              frames:2153 bytes:456686
http                             frames:55 bytes:19029
http                           frames:5 bytes:3559
http                         frames:3 bytes:2781
http                       frames:2 bytes:2234
http                     frames:2 bytes:2234
data-text-lines                frames:10 bytes:5356
tcp.segments                     frames:3 bytes:1117
http                           frames:3 bytes:1117
media                        frames:3 bytes:1117
udp                                frames:84 bytes:17035
nbdgm                            frames:50 bytes:12525
smb                            frames:50 bytes:12525
mailslot                     frames:50 bytes:12525
browser                    frames:50 bytes:12525
dns                              frames:34 bytes:4510
llc                                  frames:3142 bytes:224934
stp                                frames:3040 bytes:182400
cdp                                frames:102 bytes:42534
loop                                 frames:608 bytes:36480
data                               frames:608 bytes:36480
arp                                  frames:38 bytes:2046


### Conversations

We use: -z conv,TYPE,FILTER

TYPE could be:

• eth,
• tr,
• fc,
• fddi,
• ip,
• ipx,
• tcp,
• udp

And the filters are used to restrict the statistics.

root@pig:# tshark -r snort.log -q -z conv,ip,tcp.port==80
================================================================================
IPv4 Conversations
Filter:tcp.port==80
|           | |    Total    |
|Frames Bytes | |Frames Bytes | |Frames Bytes |
193.136.19.148  192.168.74.242 141    13091    202   259651    343   272742
192.168.74.242  128.31.0.36     22     6858     28     4784     50    11642
================================================================================


### IO

We use: -z io,stat,INT,FILTER,…,FILTER

root@pig:# tshark -r snort.log -q -z io,stat,300,'not (tcp.port=22)'
===================================================================
IO Statistics
Interval: 300.000 secs
Column #0:
|   Column #0
Time            |frames|  bytes
000.000-300.000    2161    543979
300.000-600.000    1671    264877
600.000-900.000     508     46224
900.000-1200.000     185     12885
1200.000-1500.000     201     14607
1500.000-1800.000     187     13386
1800.000-2100.000     189     13887
2100.000-2400.000     187     13386
2400.000-2700.000     189     13887
2700.000-3000.000     187     13386
3000.000-3300.000     185     12885
3300.000-3600.000     189     13887
3600.000-3900.000     210     15546
3900.000-4200.000     189     13887
4200.000-4500.000     187     13386
4500.000-4800.000     185     12885
4800.000-5100.000     189     13887
===================================================================


## Conclusion

With tshark we could do everything we want to know what is inside a network packet. The trick is to understand the statistics that tshark generate, and know how to ask it.

Now my work will get a machine to run Snort in an active mode and begin to understand how to use Snort to do all this work of collecting information.

If you feel interested and understand Portuguese, see the presentation:

## Cryptol the language of cryptography

1 04 2009

Pedro Pereira and I are working on a new project in the Masters. The second half of the Masters is composed of a single project suggested by a company. Some companies are forming partnerships in the Masters formal methods, including: the Critical software, SIG and Galois. We chose the Galois because we also are in the area of cryptography and we already knew some work of some people from this company.

The project suggested by Galois was study the Cryptol as a language of specification of cryptographic algorithms. The cipher we used for this study is the SNOW 3G (The SNOW website), later on I will talk about the specification of this cipher. In this post I am only interested to show the language.

I’m going to show you some details about the language. This post is not intend to be a exhaustive explanation of Cryptol, if you looking for that you can go directly to the manuals. This post only relates my experience, and what I like it most with the language.

## Overview

Cryptol is a high-level language that is geared to deal with low-level problems. Is a Domain-specific language to design and implement cryptographic algorithms.
This language has a high percentage of correctness of the implementation of a cipher, because it implements type inference, so we can say that a big part of the language implements correctness. This correctness is also achieved thanks to the architecture of the language – functional. We don’t have side effects – a function only return something inside is codomain.
In Cryptol we have this philosophy that says that everything is a sequence. This is very useful because we are working with low level data (array of bits), so we use sequences to represent that arrays. We can have nested sequences to have a more structured representation of data. For example, we can simply transform a 32-bit sequence in a 4 1-byte sequence.
The size of this sequences could be implemented as finite or infinite, as we going to see later in this post. Because Cryptol is a high-level language we can also implement polymorphic functions, most of the primitive functions are implemented in polymorphic mode. The way we have to navigate throw the sequences is using recursion, or sequences comprehension, and with these two techniques we can implement recurrences.

If you are a Haskell programmer you just need the next section to learn Cryptol. This language is so look a like with Haskell that even the philosophy seems to have a lot in commune.

## Types in Cryptol

The type $[32]$ means that you have a sequence of 32-bit size. All the types in Cryptol are size oriented. The unit is the $Bit$, that you can use to represent $Bool$. To represent a infinite sequence we use the reserved word $inf$, and we write: $[inf]$ to represent that.

If you want to generate a infinite sequence, we use the syntactic sugar of the sequences like that: $[1~..]$. Cryptol will infer this sequence as type

$[1~..]~:~[inf][1]$

That means this sequence have infinite positions of 1-bit words. The type inference mechanism will always optimize the size that he needs, to represent the information.
So, it infer the type of $[100~..]$ as:

$[100~..]~:~[inf][7]$

Because, it “knows” that needs only 7-bits to represent the decimal $100$. But if you need more, you can force the type of your function.
We implement polymorphism in our types, if we have:

$f~:~[a]b~\rightarrow~[a]b$

This means, that the function $f$ have polymorphism over $b$, because we say that it domain is one sequence of size $a$ of type $b$, and it codomain also. Here we could also see: $f~:~[a][b]c$ meaning that $f$ is a constant of sequences of size $b$ of type $c$, $a$ times.

So, lets talk about some primitive functions in Cryptol, and its types. The $tail$ function have the following type in Cryptol:

$tail~:~\{a~b\}~[a+1]b~\rightarrow~[a]b$

As we can see, Cryptol is so size oriented, that we can use arithmetic operators in types. We can probably infer what this function does just from it type: $tail$ works for all $a$ and $b$ such that if we have one sequence os size $a+1$ of type $b$ it returns one sequence of size $a$ of same type. In fact this function removes the first element of one sequence.

Because of this size oriented philosophy a lot of functions, that change the size of the sequences can be read just from the type.

As you can see in the following list of Cryptol primitive function:

$drop~:~\{ a~b~c \}~( fin~a ,~a~\geq~0)~\Rightarrow~(a ,[ a + b ]~c )~\rightarrow~[ b ]~c$
$take~:~\{ a~b~c \}~( fin~a ,~b~\geq~0)~\Rightarrow~(a ,[ a + b ]~c )~\rightarrow~[ a ]~c$
$join~:~\{ a~b~c \}~[ a ][ b ] c~\rightarrow~[ a * b ]~c$
$split~:~\{ a~b~c \}~[ a * b ] c~\rightarrow~[ a ][ b ]~c$
$tail~:~\{ a~b \}~[ a +1] b~\rightarrow~[ a ]~b$

## Recursion and Recurrence

Cryptol implements Recursion, just like a lot of functional languages do.

Imagine the fibonacci function definition:

It implementation in Crytol is exactly the same as defined mathematically.

fib : [inf]32 -> [inf]32;
fib n = if n == 0 then 0 else if n == 1 then 1 else fib (n-1) + fib (n-2);

Cryptol uses recursion to permit us to iterate throw sequences.

But, If you prefer you can implement a more functional algorithm of fibonacci function in Cryptol:

fib : [inf]32 -> [inf]32;
fib n = fibs @ n;
where {
fibs : [inf]32;
fibs = [0 1] # [| x + y || x <- drop (1,fibs) || y <- fibs |];
};

Here, as you can see, we define a infinite list $fibs$ of all the fibonacci numbers, by calling the $fibs$ inside the sequences comprehension $fibs$, this is called a recurrence, and you can use that too in Cryptol.

## Cryptol vs C

I’m going to show you some part of the implementation of SNOW 3G in C. This is a function called $MUL_{\alpha}$

MULa : [8] -> [32];
MULa(c) = join ( reverse [
( MULxPOW(c, 23 :[32], 0xA9) )
( MULxPOW(c, 245:[32], 0xA9) )
( MULxPOW(c, 48 :[32], 0xA9) )
( MULxPOW(c, 239:[32], 0xA9) ) ] );

/* The function MUL alpha.
Input c: 8-bit input.
Output : 32-bit output.
See section 3.4.2 for details.
\*/
u32 MULalpha(u8 c) {
return
((((u32)MULxPOW(c,23, 0xa9)) << 24 ) |
(((u32)MULxPOW(c, 245,0xa9)) << 16 ) |
(((u32)MULxPOW(c, 48,0xa9)) << 8 ) |
(((u32)MULxPOW(c, 239,0xa9)))) ;
}


You can see that in Cryptol we just say that we want to work with a 32-bit word, and we don’t need to do any shift to our parts of the word. We just join them together. We reverse the sequence, because Cryptol stores words in little-endian, and we want to keep the definition like the specification.

This is a very simple function, so the result in C is not so that different. But if we have a more complex function, we were going to start having a nightmare to write that in C.

## Conclusion

Well, the conclusion is that Cryptol is a language that really help to write low-level algorithms. With Cryptol the specification is formal and easier to read than other languages. A value of Cryptol is that the code can be converted to other languages, such as VHDL and C.

If you’re interested, take a look at the presentation that we did.