A Simple Lambda Calculus Evaluator

A Simple Lambda Calculus Evaluator - III

January 19, 2011

In the previous post, we translated the lambda calculus expressions into a syntax tree using some compiler generate tools. Now, we are about to evaluate it by traversing the syntax tree. First, I will introduce the evaluation algorithm in this post. Then, the implementation is presented along with some code snippet. Finally, an interactive interface which combines all the parts together is implemented. The full source code can be checkout here.

Evaluation Algorithm

For the evaluation algorithm, we have a syntax tree as the input, and as a result we are also going to output a syntax tree which is evaluated from the original one.

Let’s consider the simple case first, an identifier node which doesn’t have any child node, the node should evaluate to itself because no reduction can be applied to it. For the abstraction node, which has an identifier child node and another expression child node, which it is a syntax tree itself. The expression child node will be evaluated first and the resulting node will be set as the expression child node. The other part will stay the same. For an application node, which has two expression child nodes, both of them will be evaluated first and replaced by the resulting nodes. Finally according to the semantic of lambda application, a beta-reduction should be performed on the application node because the beta-reduction captures the idea of function application.

From the analysis above, we can work out a recursive algorithm which does a deep-first traversal on the syntax tree and evaluates the syntax tree in a bottom-up manner. Here is the psuedocode:

evaluate(tree):
    if tree is identifier node:
        return tree;
    else if tree is abstraction node:
        tree->right = evaluate(tree->right);
        return tree;
    else if tree is application node:
        tree->left = evaluate(tree->left);
        tree->right = evaluate(tree->right);
        tree = betaReduction(tree);
        return tree;

Now, the problem is reduced to the beta-reduction of a lambda calculus expression. In the first post, we learned that the beta-reduction can be simple defined in terms of substitution, so again our problem is reduced to implement the substitution algorithm of expressions. We can simply derive a recursive algorithm from the substitution rules defined in the first post:

substitute(tree, var, sub):
    if tree is identifier node:
        if tree.name equals var.name:
            return sub;
        else:
            return tree;
    else if tree is application node:
        tree->left = substitute(tree->left, var, sub);
        tree->right = substitute(tree->right, var, sub);
        return tree;
    else if tree is abstraction node:
        if tree.name not equals var.name:
            if tree->left is a free variable in sub:
                tree = alpahReduction(tree);
            tree->right = substitute(tree->right, var, sub);
        return tree;

Pay attention to the abstraction node case, we check the conditions in the last rule. If the condition is not met, an alpha-reduction is applied to the syntax tree before doing substitution.

There are two remaining problems: alpha-conversion and free variables. The definition of free variables in the first post is very straightforward, so I won’t duplicate the algorithm here. With regard to alpha-conversion, the process consists of the following steps:

Find the set of free variables of the expression child node;

Pick a new identifier name which is different from the old one and not in the free variable set in step 1;

Substitute all free occurrences of the identifier in expression child node with the new identifier;

Replace the old identifier node with the new one.

In step 3, the substitute procedure above will be used. All the algorithms used for this evaluator has been presented, now we are going to write the codes.

Implementation

It’s time to implement the evaluator. The code snippet of different algorithms will be shown in this section. To emphasize the most important part, the error handling and memory release stuff is ignored.

First and foremost, let’s see the function that finds the free variable set of an expression.

 1 static VarSet * FV(TreeNode *expr) {
 2     VarSet* set = NULL;
 3     VarSet* set1 = NULL;
 4     VarSet* set2 = NULL;
 5     switch(expr->kind) {
 6         case IdK:
 7             set = newVarSet();
 8             addVar(set,expr->name);
 9             break;
10         case AbsK:
11             set = FV(expr->children[1]);
12             deleteVar(set,expr->children[0]->name);
13             break;
14         case AppK:
15             set = newVarSet();
16             set1 = FV(expr->children[0]);
17             set2 = FV(expr->children[1]);
18             unionVarSet(set,set1,set2);
19             break;
20         default:
21             fprintf(errOut,"Unknown expression type.\n");
22     }
23     return set;
24 }

The VarSet is a data structure that represents a set of variables. It uses an inner hashtable to store the variables. The functions, newVarSet, addVar, deleteVar, unionVarSet, are very intuitive by their names. Refer to file varset.c for information about the VarSet implementation.

The code snippet for alpha-conversion is like:

 1 TreeNode * alphaConversion(TreeNode *expr) {
 2     VarSet* set = FV(expr->children[1]);
 3     char * name = strdup(expr->children[0]->name);
 4     // pick a new name
 5     while(strcmp(name,expr->children[0]->name)==0 ||  contains(set,name)==1) {
 6         char lastchar = name[strlen(name)-1];
 7         name[strlen(name)-1] = 'a' + (lastchar+1-'a')%('z'-'a'+1);
 8     }
 9     TreeNode *var = newTreeNode(IdK);
10     var->name = name;
11     TreeNode *result = substitute(expr->children[1], expr->children[0], var);
12 
13     expr->children[1] = result;
14     expr->children[0] = var;
15     return expr;
16 }

The method for picking a new variable name is: replace the last character of the variable by a letter comes after it in the alphabet. This works for most cases though it may failed if all the attempted names are used up.

Here comes the most important function, substitute, which is used by both alpha-conversion and beta-reduction:

 1 static TreeNode *substitute(TreeNode *expr, TreeNode *var, TreeNode *sub) {
 2     const char * parname = NULL;
 3     TreeNode * result = NULL;
 4     switch(expr->kind) {
 5         case IdK:
 6             if(strcmp(expr->name,var->name)==0) {
 7                 return sub;
 8             }else {
 9                 return expr;
10             }
11         case AbsK:
12             parname = expr->children[0]->name;
13             if(strcmp(parname,var->name)!=0) {
14                 VarSet* set = FV(sub);
15                 while(contains(set,parname)) {  // do alpha conversion
16                     expr = alphaConversion(expr);
17                     parname = expr->children[0]->name;
18                 }
19                 result = substitute(expr->children[1],var,sub);
20                 expr->children[1] = result;
21             }
22             return expr;
23         case AppK:
24             result = substitute(expr->children[0],var,sub);
25             expr->children[0] = result;
26             result = substitute(expr->children[1],var,sub);
27             expr->children[1] = result;
28             return expr;
29         default:
30             fprintf(errOut,"Unknown expression type.\n");
31     }
32     return expr;
33 }

It recursively applies itself to the child nodes of the expression.

Now, we can deal with the beta-reduction:

 1 TreeNode * betaReduction(TreeNode *expr) {
 2     TreeNode* left = expr->children[0];
 3     if(left->kind==IdK || left->kind==AppK) {
 4         return expr;
 5     }else if(left->kind==AbsK) {
 6         TreeNode* result = substitute(left->children[1],left->children[0],expr->children[1]);
 7         return result;
 8     }
 9     return expr;
10 }

Finally, all the build blocks are ready. We can implement the main evaluation function:

 1 TreeNode * evaluate(TreeNode *expr) {
 2     if(expr!=NULL) {
 3         switch(expr->kind) {
 4             case IdK:
 5                 return expr;
 6             case AbsK:
 7                 expr->children[1] = evaluate(expr->children[1]);
 8                 return expr;
 9             case AppK:
10                 expr->children[0] = evaluate(expr->children[0]);
11                 expr->children[1] = evaluate(expr->children[1]);
12                 return betaReduction(expr);
13             default:
14                 fprintf(errOut,"Unkown expression kind.\n");
15         }
16     }
17     return expr;
18 }

Evaluator Driver

Different parts, including scanner, parser and evaluator, has been implemented separately. We need a driver method to combine them all, and it should provides an interactive interface which reads a lambda calculus expression from user input and output the evaluated expression in a human-readable format.

Here is the code:

 1 FILE* in;
 2 FILE* out;
 3 FILE* errOut;
 4 
 5 TreeNode * tree = NULL;    // used in the parser
 6 
 7 #define BUFF_SIZE 255
 8 
 9 int main(int argc, char* argv[]) {
10     in = stdin;
11     out = stdout;
12     errOut = stderr;
13 
14     char buff[BUFF_SIZE];
15 
16     fprintf(out,"Welcome to Lambda Calculus Evaluator.\n");
17     fprintf(out,"Press Ctrl+C to quit.\n\n");
18     while(1) {
19         fprintf(out,"> ");
20         fgets(buff,BUFF_SIZE-1,in);
21         yy_scan_string(buff);
22         yyparse();
23 
24         tree = evaluate(tree);
25         fprintf(out,"-> ");
26         printExpression(tree);
27         deleteTreeNode(tree);
28         tree=NULL;
29         yy_delete_buffer();
30         buff[0] = EOF;
31         fprintf(out,"\n\n");
32     }
33     return 0;
34 }

We set the standard input as the input and standard output as the output. The library function fgets() is used to read the user input. We use yy_scan_string() to feed the user input to the parser, so the parser will read input from this buffer rather than the standard input. The yyparse() function is the interface exposed by Yacc to run the parser. The printExpression() function is a utility that prints the expression in a human-readable format.

Let’s try some expressions:

$ ./main
Welcome to Lambda Calculus Evaluator.
Press Ctrl+C to quit.

> a
-> a

> (lambda x (lambda y y) a)
-> (lambda x a)

> (lambda x x) a
-> a

Conclusion

This series of post introduces how to write a very simple evaluator for lambda calculus. It covers the topics about scanner, parser and evaluator. In order to keep it really pure, there are some limitations of the syntax, such as not constant support, cannot place parentheses around expressions to change the evaluation order from left to right. However, it is rather easy to support those features by extending the syntax. We can do it in the future.

References

Lambda Calculus on Wikipedia.
The LEX & YACC Page.
A lecture notes about lambda calculus, including a lambda calculus evaluator.