From LALR(1) to LL(k)

2 04 2011

I, Pedro Faria and Jose Pedro Silva have been working on a LALR to LL conversor. Our idea was to discover the problems involved in this type of automatic conversion.
Our main goal here was to receive an LALAR(1) grammar and produce an LL(k) one. As a test case we receive an Yacc and produce ANTLR.
For the purpose of testing we used the LogoLISS grammar, a language inspired in LOGO programming language and developed in my university to school grammars and program transformation.

The main Yacc grammar was kind of a big one, here it is:

Liss            : PROGRAM IDENT '{' Body '}'
                ;
Body            : DECLARATIONS Declarations
                     STATEMENTS Statements
                ;
Declarations    : Declaration
                | Declarations Declaration
                ;
Declaration     : Variable_Declaration
                ;
Variable_Declaration : Vars SETA Type ';'
                     ;
Vars        : Var
            | Vars ',' Var
            ;
Var         : IDENT Value_Var
            ;
Value_Var   :
            | '=' Inic_Var
            ;
Type        : INTEGER
            | BOOLEAN
            | ARRAY SIZE NUM
            ;
Inic_Var    : Constant
            | Array_Definition
            ;
Constant    : '(' Sign NUM ')'
            | TRUE
            | FALSE
            ;
Sign        :
            | '+'
            | '-'
            ;
Array_Definition  : '[' Array_Initialization ']'
                  ;
Array_Initialization  : Elem
                      | Array_Initialization ',' Elem
                      ;
Elem        : Sign NUM
            ;
Statements      : Statement
                | Statements Statement
                ;
Statement   : Turtle_Commands
            | Assignment
            | Conditional_Statement
            | Iterative_Statement
            ;
Turtle_Commands  : Step
                 | Rotate
                 | Mode
                 | Dialogue
                 | Location
                 ;
Step     : FORWARD Expression
         | BACKWARD Expression
         ;
Rotate   : RRIGHT
         | RLEFT
         ;
Mode     : PEN UP
         | PEN DOWN
         ;
Dialogue : Say_Statement
         | Ask_Statement
         ;
Location : GOTO NUM ',' NUM
         | WHERE '?'
         ;
Assignment      : Variable '=' Expression
                ;
Variable        : IDENT Array_Acess
                ;
Array_Acess     :
                | '[' Single_Expression ']'
                ;
Expression      : Single_Expression
                | Expression Rel_Op Single_Expression
                ;
Single_Expression  : Term
                   | Single_Expression Add_Op Term
                   ;
Term        : Factor
            | Term Mul_Op Factor
            ;
Factor      : Constant
            | Variable
            | SuccOrPred
            | '(' '!' Expression ')'
            | '(' '+' Expression ')'
            | '(' '-' Expression ')'
            | '(' Expression ')'
            ;
Add_Op      : '+'
            | '-'
            | OU
            ;
Mul_Op      : '*'
            | '/'
            | E
            | MULTMULT
            ;
Rel_Op      : IGUAL
            | DIF
            | '<'             | '>'
            | MENORIGUAL
            | MAIORIGUAL
            | IN
            ;
SuccOrPred  : SuccPred IDENT
            ;
SuccPred    : SUCC
            | PRED
            ;
Say_Statement  : SAY '(' Expression ')'
               ;
Ask_Statement  : ASK '(' STRING ',' Variable ')'
               ;
Conditional_Statement : IfThenElse_Stat
                      ;
Iterative_Statement : While_Stat
                    ;
IfThenElse_Stat     : IF Expression THEN '{' Statements '}' Else_Expression
                    ;
Else_Expression     :
                    | ELSE '{' Statements '}'
                    ;
While_Stat     : WHILE '(' Expression ')' '{' Statements '}'
               ;

This is a toy grammar to help students to understand and get a deep knoledge of some fundamental types of productions in grammars.
The kind of texts this grammar is able to analyse are:

PROGRAM logolissExample {
    DECLARATIONS
    x = (100) , y -> Integer ;
    z -> Boolean ;
    w = TRUE -> Boolean ;
    STATEMENTS
        FORWARD x
        RRIGHT
        y = (100)
        FORWARD e
        RRIGHT
        x = x - (100) + (20)
        FORWARD x + (100)
        RRIGHT
        FORWARD (100)
}

Rules

After some time thinking about this problem we tried to solve the left recursion problem. ANTLR does not know how to handle with left recursion, so we must use the EBNF support to translate this productions.

Grammar rules in BNF provide for concatenation and choice but no specific operation equivalent to the * of regular expressions are provided. In Yacc the only way we get repetition is using the following pattern:

A : A a | a ;

We call this kind of rules a left recursive rule.

So, we discover this two generalized rules to remove left recursion in Yacc grammars (click on the image to expand):

\begin{tabular}{|l|l|c|}\hline\textbf{LALR (BNF)} & \textbf{LL (EBNF)} & Nome da Regra\\\hline\begin{tabular}{ccl}vars & : & var\\& | & vars a b c d e \ldots\end{tabular}&\begin{tabular}{ccl}vars & : & var (a b c d e \ldots)+\end{tabular}&removeLeftRecursionPlusExt\\\hline\begin{tabular}{ccl}vars & : & \\&| & vars a b c d e \ldots\end{tabular}&\begin{tabular}{ccl}vars & : & (a b c d e \ldots)*\end{tabular}&removeLeftRecursionTimesExt\\\hline\end{tabular}

Implementation

So, we start right over the implementation of the conversor. We used Yapp because lately we have been using a lot of Perl, so we are want to get deeper into Perl packages.

We start to implement the definition of a Yacc grammar in Yapp:

productions : production             
            | productions production 
            ;
production  : nonTerminal ':' derivs ';' 
            ;
derivs      : nts            
            | derivs '|' nts 
            |
            ;
nts         : nt     
            | nts nt 
            ;
nt          : terminal    
            | nonTerminal 
            | sep         
            ;
terminal    : STRING_TERMINAL     
            ;
nonTerminal : STRING_NON_TERMINAL
            ;
sep         : SEPARATOR 
            ;

And we move along to represent this information in a structure. We chose an peculiar structure to help us processing it.
This is the structure defined as Haskell types:

type Grammar = [Production]
type Production = HashTable ProductionName [Derivation]
type Derivation = [HashTable NTS Type]
type Type = String
type NTS = String

Easy to find stuff, easy to implement in Perl 🙂
So, this is our Yapp grammar again but this time with semantic actions:

productions : production             { return $_[1]; }
            | productions production {
                                       push @{$_[1]},@{$_[2]};
                                       return $_[1];
                                     }
            ;

production  : nonTerminal ':' derivs ';' { return [ { $_[1] => $_[3] } ]; }
            ;

derivs      : nts            { return [$_[1]]; }
            | derivs '|' nts {
                               push @{$_[1]},$_[3];
                               return $_[1];
                             }
            |                { return [[ { 'epsilon' => 'epsilon' } ]]; }
            ;
nts         : nt     { return $_[1]; }
            | nts nt {
                       push @{$_[1]},@{$_[2]};
                       return $_[1];
                     }
            ;
nt          : terminal    {
                            return [ { $_[1] => 'terminal' } ];
                          }
            | nonTerminal {
                            return [ { $_[1] => 'nonTerminal' } ];
                          }
            | sep         {
                            return [ { $_[1] => 'sep' } ];
                          }
            ;
terminal    : STRING_TERMINAL     {
                                    return $_[1];
                                  }
            ;
nonTerminal : STRING_NON_TERMINAL {
                                    return $_[1];
                                  }
            ;
sep         : SEPARATOR           {
                                    return $_[1];
                                  }
            ;

After we had this structure filled in we applied the two rules shown before and we get rid of left recursion for good.

So, after we process the following grammars with our transformer:

a   : a B C
    | //nothing
    ;
expr : expr "+" expr
     | expr "-" expr
     | expr "*" expr
     ;

We get:

a   : (B C)*
    ;
expr : ( "+" expr)+
     | ( "-" expr)+
     | ( "*" expr)+
     ;

You can see all the code produced (including makefiles) in my github repo.

Other rules

We have noticed that only removing the left recursion makes this grammar work in ANTLR.
We are aware that to fully convert an LALR grammar to LL we need one other step: Left Factoring.

Because our LL system is k-token lookahead we does not have to mind about ambiguous grammars.
Examples like this are not a problem for our destination system.

expr: T A B
    | T C D
    ;

However if our destination grammar was LL(1) we needed to convert this to it’s EBNF form.

Lexer

We also translated the Lex used in flex to ANTLR. The main problem here is that we used only matching functions in Perl and we do not used any grammar for this effect.
The main problem is that the regular expressions in flex are different to regexps in ANTLR.
You can see the Perl code for yourself here.
But you if you want to do a translation from flex to ANTLR better you define a flex grammar.





Permutations in pure functional ActionScript

2 04 2011

I have been working with ActionScript3 and Flex to do apps for Android. I have never used this languages before, so in the beginning I was with my hart open to understand this languages, specially ActionScript3, the only thing I knew about ActionScript is that is used a lot for flash animations and that was it.

At a first glance ActionScript3 seemed to me to be a standard object oriented language, but after I grasp a little bit more and I found that the version 3 has a lot of functional flavor in it.
I started to find more and more examples of it: definition of high-order functions, map, filter and so…

I have to say that I deeply continue to prefer the Haskell notation with concern to high order and the 2 functions, but the point of ActionScript3 was never be totally functional, but incorporate some features of it.

So, outside of work I remember the beautiful definition of permutations in Haskell and I start try to implement the same algorithm in ActionScript3.

perms xs = [x:ps | x <- xs, ps <- perms (xs\\[x])]

For the non Haskell programmers I will explain: We receive the list xs and here use the list comprehension notation, basically we can translate a list comprehension in the form:
[f~x | x \leftarrow list] as a map function this way map~f~list. So, that’s basically the first part of our definition. The second one [x:ps | x \leftarrow list, ps \leftarrow perms(list_2)] is the cartesian product of the two lists.
Finally the definition of the function xs//[x] is simply the exclusion of the element x from the list xs.

So, I first came out with this solution:

public static function perms(xs:Array):Array {
    if(xs.length == 0) {
        return [[]];
    }   
    return xs.map(
        function(x:*,index:uint,array:Array):* {
            var ret:Array = new Array();
            var p:Array = perms( exclude(xs,x) );
            for(var j:uint=0 ; p[j] ; j++) {
                var ps:Array = p[j] as Array;
                ps.unshift(x);
                ret.push(ps);
            }   
            return ret;
        }   
    );  
}

public static function exclude(xs:Array, elem:*):Array {
    return xs.filter(
        function(item:*,index:uint,array:Array):Boolean { return item != elem; }
    );  
}

I had to define the exclude function, because it does not exist in ActionScript3 and I understand that.
The code is not as pretty as you expect to be in Haskell or other functional language, but is pretty much the same, as you ca see in the following definition, where I only use map.

public static function permss(xs:Array):Array {
    if(xs.length == 0) {
        return [[]];
    }
    return xs.map(
        function(x:*,index:uint,array:Array):* {
            var p:Array = permss( exclude(xs,x) );
            return p.map(
                function(y:*,_:uint,__:Array):* {
                    var ps:Array = y as Array;
                    ps.unshift(x);
                    return ps;
                }
            );
        }
    );
}
public static function exclude(xs:Array, elem:*):Array {
    return xs.filter(
        function(item:*,index:uint,array:Array):Boolean { return item != elem; }
    );
}