Correct sorting with Frama-C and some thoughts on Formal Methdos

12 02 2011

A couple of years ago, during my masters on Formal Methods I have been working with automatic provers and I also used Frama-C, this is a tool that allow the user to prove C code directly in the source code, using a special notation in the comments, called ACSL notation.

Frama-C allows you to make two kinds of proofs, security and safety ones. The safety ones are related with arrays index out of bounds access, and so. This kind of proofs are related to the language itself and they are easy to do if you use loop invariants, pre and post conditions.
If you use a high level language, like JAVA you won’t have almost none safety problems.
Because C is too close to machine level code, we can do things that we do not intend (or maybe we do and we use C exactly because it allows this kind of things). For example:

// foo.c file
#include <stdio.h>

int main() {
char *a = "I like you";
char *b = "I hate you";

if(&a < &b) a = *(&a + 1);
else        a = *(&a - 1);

printf("%s\n", a);
}


As you can see, I never used the $b$ variable for nothing, just have declared it. And the result is:

[ulissesaraujocosta@maclisses:c]-$gcc -o foo foo.c [ulissesaraujocosta@maclisses:c]-$ ./foo
I hate you


This lack of security of language C is one of the reasons we need to write safety statements. Of course this kind of things is why C is so fast and powerful, the person in charge is always the programmer. If you are interested in this kind of tricks and want to understand more about this and smashing the stack and so, feel free to read more posts in my blog about this subject.

The other kind of statements (security ones) are related to the functionality of the program and that’s basically where the problem or the effort is, I will talk about this later on. First let’s see the algorithm and the implementation in C.

Code

The algorithm I use here is just a simple example. I used bubble sort, this is a sort algorithm not very efficient, but it uses none more memory then the needed to store the structure you want to sort.
To get a visual understanding of the algorithm (and to see it inefficiency) check out this youtube video.

This is the implementation of the algorithm:

void swap(int *i, int *j) {
int tmp = *i;
*i = *j;
*j = tmp;
}

void bubbleSort(int *vector, int tam) {
int j, i;
j = i = 0;
for(i=0; i<tam; i++) {
for(j=0; j<tam-i-1; j++) {
g_swap = 0;
if (vector[j] > vector[j+1]) {
swap(&vector[j],&vector[j+1]);
}
}
}
}


Pre, Post conditions and thinking formally

So, as you can see in the video (or in the code) the algorithm is pretty much simple, we pick the $i$ element cross the array $n$ times and for each element we compare with $i$, this $n$ times.

We have as pre conditions: The size of the $vector$ must be greater than zero, and all the positions in that vector exists, so in Frama-C we use the $valid\_range(vector, i, j)$, where $i$ and $j$ are indexes of the $vector$ to say that all elements exist.

$tam > 0$

$valid\_range(vector,0,tam-1)$

Ans as pos conditions we must ensure that the array is sorted ( I will talk predicate this later).
You may think that this by itself is enough to make a complete proof, but you are wrong. Image that my function clear all the elements in the array and fill the array with $\{1,2,..,tam\}$, our code will be proved and its wrong!

So, we need to say more… First thing that can pop to your head is OK, we will say that we have the same numbers in the beginning and in the end and you write this:
$\forall_a : 0 \leq a < tam : (\exists_b : 0 \leq b < tam : old(vector(b)) \equiv vector(a))$

In fact this is closer (not yet right), imagine that you give as input:
$\{4,7,9,1,0,3,4\}$. If your code returns $\{0,1,3,4,7,9\}$ (we miss the repeated $4$) the code will be proved.
So, the solution if to make a $Permut$ predicate and prove for the multi set.
So, this are the post conditions:

$sorted(vector,0,tam-1)$

$Permut\{Old,Here\}(vector,0,tam-1);$

Frama-C is so cool because for example at the pos condition if we want to refer to the state in the beginning (before call the function) we use $Old$ and if we want to refer to the moment after the call we heave the $Here$ keyword, remember we are at the post condition, so this wil be executed in the end (so $Here$ means the end of the function call).

Predicates

So, here is the $Sorted$ predicate. Predicates receive a state $L$ and the parameters (just like a function) and they return bool values (true or false). Inside we use regular ACSL notation. Here I define that for an array to be sorted each element must be less or equal to the next one.

/*@ predicate Sorted{L}(int a[], integer l, integer h) =
@   \forall integer i; l <= i < h ==> a[i] <= a[i+1];
@*/


The $Permut$ is defined inductively, so we receive two states $L1$ and $L2$ and the array $a$ and the range where we want to permute.
We write multiple rules for the permutation, reflection, symmetry, transitivity and finally the most important one, the $Swap$. So basically here we say that a permutation is a set of successive swaps.

/*@ inductive Permut{L1,L2}(int a[], integer l, integer h) {
@  case Permut_refl{L}:
@   \forall int a[], integer l, h; Permut{L,L}(a, l, h) ;
@  case Permut_sym{L1,L2}:
@    \forall int a[], integer l, h;
@      Permut{L1,L2}(a, l, h) ==> Permut{L2,L1}(a, l, h) ;
@  case Permut_trans{L1,L2,L3}:
@    \forall int a[], integer l, h;
@      Permut{L1,L2}(a, l, h) && Permut{L2,L3}(a, l, h) ==>
@        Permut{L1,L3}(a, l, h) ;
@  case Permut_swap{L1,L2}:
@    \forall int a[], integer l, h, i, j;
@       l <= i <= h && l <= j <= h && Swap{L1,L2}(a, i, j) ==>
@     Permut{L1,L2}(a, l, h) ;
@ }
@
@ predicate Swap{L1,L2}(int a[], integer i, integer j) =
@      \at(a[i],L1) == \at(a[j],L2)
@   && \at(a[j],L1) == \at(a[i],L2)
@   && \forall integer k; k != i && k != j ==> \at(a[k],L1) == \at(a[k],L2);
@*/


So, as you can see the bubble sort function itself have 18 lines of code, and in the end with the annotations for the proof we end with 90 lines, but we proved it!

Thoughts

My main point here is to show the thinking we need to have if we want to prove code in general. Pick what language you want, this is the easiest way you will have to prove software written in C. Sometimes if your functions are too complex you may need to prove it manually. The problem is not on the Frama-C side, Frama-C only generates the proof obligations to feed to automatic provers, like Yices, CVC3, Simplify, Z3, Alt-Ergo and so.

My point here is to show the cost of proving software. Proving software, specially if the language is too low level (like C – you need to care about a lot more things) is hard work and is not easy to a programmer without theoretical knowledge.
On the other side, you end up with a piece of software that is proved. Of course this proof is always requirements oriented, ny that I mean: if the requirements are wrong and the program is not doing what you expect the proof is along with that.
I do not stand to proof of all the code on the planet, but the proper utilization of FM (formal methods) tools for critical software.

I steel been using Frama-C since I learned it in 2009, nowadays I use it for small critical functions (because I want, I’m not encouraged to do so) and I have to say that the use of FM in the industry is far. As I told you Frama-C is the easiest automatic proof tool you will find at least that I know.

Talking with Marcelo Sousa about the use of FM in industry, we came to the conclusion that the people that are making this kind of tools and have the FM knowledge don’t make companies. I think if more brilliant people like John Launchbury make companies, definitely FM will be more used.

Source code

Here is all the code together if you want to test it:

// #include <stdio.h>

/*@ predicate Sorted{L}(int a[], integer l, integer h) =
@   \forall integer i; l <= i < h ==> a[i] <= a[i+1];
@
@ predicate Swap{L1,L2}(int a[], integer i, integer j) =
@      \at(a[i],L1) == \at(a[j],L2)
@   && \at(a[j],L1) == \at(a[i],L2)
@   && \forall integer k; k != i && k != j ==> \at(a[k],L1) == \at(a[k],L2);
@*/

/*@ inductive Permut{L1,L2}(int a[], integer l, integer h) {
@  case Permut_refl{L}:
@   \forall int a[], integer l, h; Permut{L,L}(a, l, h) ;
@  case Permut_sym{L1,L2}:
@    \forall int a[], integer l, h;
@      Permut{L1,L2}(a, l, h) ==> Permut{L2,L1}(a, l, h) ;
@  case Permut_trans{L1,L2,L3}:
@    \forall int a[], integer l, h;
@      Permut{L1,L2}(a, l, h) && Permut{L2,L3}(a, l, h) ==>
@        Permut{L1,L3}(a, l, h) ;
@  case Permut_swap{L1,L2}:
@    \forall int a[], integer l, h, i, j;
@       l <= i <= h && l <= j <= h && Swap{L1,L2}(a, i, j) ==>
@     Permut{L1,L2}(a, l, h) ;
@ }
@*/

/*@ requires \valid(i) && \valid(j);
@ //assigns *i, *j; //BUG 0000080: Assertion failed in jc_interp_misc.ml
@ ensures \at(*i,Old) == \at(*j,Here) && \at(*j,Old) == \at(*i,Here);
@*/
void swap(int *i, int *j) {
int tmp = *i;
*i = *j;
*j = tmp;
}

/*@ requires tam > 0;
@ requires \valid_range(vector,0,tam-1);
@ ensures Sorted{Here}(vector, 0, tam-1);
@ ensures Permut{Old,Here}(vector,0,tam-1);
@*/
void bubbleSort(int *vector, int tam) {
int j, i;
j = i = 0;
//@ ghost int g_swap = 0;

/*@ loop invariant 0 <= i < tam;
@ loop invariant 0 <= g_swap <= 1;
//last i+1 elements of sequence are sorted
@ loop invariant Sorted{Here}(vector,tam-i-1,tam-1);
//and are all greater or equal to the other elements of the sequence.
@ loop invariant 0 < i < tam ==> \forall int a, b; 0 <= b <= tam-i-1 <= a < tam ==> vector[a] >= vector[b];
@ loop invariant 0 < i < tam ==> Permut{Pre,Here}(vector,0,tam-1);
@ loop variant tam-i;
@*/
for(i=0; i<tam; i++) {
//@ ghost g_swap = 0;
/*@ loop invariant 0 <= j < tam-i;
@ loop invariant 0 <= g_swap <= 1;
//The jth+1 element of sequence is greater or equal to the first j+1 elements of sequence.
@ loop invariant 0 < j < tam-i ==> \forall int a; 0 <= a <= j ==> vector[a] <= vector[j+1];
@ loop invariant 0 < j < tam-i ==> (g_swap == 1) ==> Permut{Pre,Here}(vector,0,tam-1);
@ loop variant tam-i-j-1;
@*/
for(j=0; j<tam-i-1; j++) {
g_swap = 0;
if (vector[j] > vector[j+1]) {
//@ ghost g_swap = 1;
swap(&vector[j],&vector[j+1]);
}
}
}
}

/*@ requires \true;
@ ensures \result == 0;
@*/
int main(int argc, char *argv[]) {
int i;
int v[9] = {8,5,2,6,9,3,0,4,1};

bubbleSort(v,9);

//     for(i=0; i<9; i++)
//         printf("v[%d]=%d\n",i,v[i]);

return 0;
}


If you are interested in the presentation me and pedro gave at our University, here it is: