Textual Machine

May 22, 2011

For a complex expression, such as a recursive function using Y, we may go through dozens of reduction steps until getting to the final result. Currently, our evaluator only gives the final result. However, these intermediate steps are really useful for analysis and troubleshooting sometimes. Just like the manual reductions we did in last post, we can check whether the evaluator follows the reduction rules by examining the intermediate steps. So in this post, I am going to refactor the implementation of the evaluator so that it is able to print the intermediate reduction steps.

Recursive Way

An essential task of the evaluator is to find the redex to be reduced next. In our current implementation, we find it in a recursive way. That is, to find a redex in an expression, we try to find it in its child expressions first. If not found, we check if the expression itself is a redex. In this process, the evaluate function may call itself for some subexpressions. This keeps the code simple and clean. However, it also puts us in a dilemma that we cannot get the whole picture of the evaluation process. In each function invocation, we are only aware of the expression or subexpression which is under evaluation, but we don’t know its role(as a function or an argument) in the original expression, what its parent expression is and so on. All the information is called evaluation context. Actually, we do have it in the sequence of C function calls. In the runtime environment of C, the function invocations are stored in a stack, and for each one, there is a stack frame for it, which keeps data for arguments, results and temporary values. So the expression we are currently evaluating is stored in the top stack frame, and its parent is stored in a previous one, and so on. The evaluation context is hidden in the list of stack frames, but we don’t have an easy way to get them in C.

A potential solution is to use a global variable pointing to the root of the expression tree and print it after each evaluation step. However, this method may not work for all cases. Take (lambda x + x x) 1 as an example, after a beta-reduction, the original tree is deleted with a new tree returned as the result, so the global variable becomes a dangling pointer. Even if this could meet our requirement for printing intermediate evaluation steps, we still cannot get more evaluation context information. So I am going to implement the evaluation algorithm in a different way to make the information easy accessible.

Iterative Way

The problem for recursive way is the evaluation context is saved across many stack frames and only that in the top one is accessible. So a solution we can easily thought of is keeping all the evaluation context information in the same stack frame. We need to get rid of the recursive calls and use an iterative way. To find the redex, we need to traverse the expression tree, and all the work is done in a single evaluate function call. Since everything is in the same stack frame now, we are allowed to get anything about the evaluation context.

Based on the call-by-value evaluation strategy, we can derive the following rules to find a redex:

1) An identifier, constant or abstraction expression is not a redex;
2) If the expression is an application or primitive application, check its child expressions from left to right, if anyone is not a value, find the redex in it. Otherwise, pick the expression as a redex.

Since the leftmost expression and the argument expression is evaluated first, we need a post-order traversal of the expression tree to find the redex and evaluate it. After each step, a new expression is got and it is closer to the final result. The same post-order traversal will be performed on the expression again and again, until a value is returned. For the case that an expression can never reduce to a value, the evaluator will run forever. For another case that something is wrong in the expression, an error message will be printed and the evaluator will returns NULL. The error handling mechanism will be added in the future so values will be returned for erroneous expressions as well. Here is the code for this iterative way:

 1 // file: eval.c
 2 TreeNode * evaluate(TreeNode *expr) {
 3     TreeNode* state = expr;
 4     TreeNode** previous = NULL;
 5     TreeNode* current = NULL;
 6     while(state!=NULL && !isValue(state)) {
 7         previous = &state;
 8         current = state;
 9         while(current!=NULL) {
10             if(!isValue(current->children[0])) {
11                 previous = &current->children[0];
12                 current = current->children[0];
13             }else if(!isValue(current->children[1])) {
14                 previous = &current->children[1];
15                 current = current->children[1];
16             }else { // reduce the current node
17                 if(current->kind==AppK) {   // applications
18                     if(current->children[0]->kind==ConstK) {
19                         fprintf(errOut, "Error: cannot apply a constant to any argument.\n");
20                         fprintf(errOut, "\t\t");
21                         printExpression(current,errOut);
22                         deleteTree(state);
23                         return NULL;
24                     }else if(current->children[0]->kind==IdK) {
25                         // find function from builtin and standard library
26                         TreeNode* fun = resolveFunction(current->children[0]->name);
27                         if(fun==NULL) {
28                             fprintf(errOut, "Error: %s is not a predefined function.\n", current->children[0]->name);
29                             deleteTree(state);
30                             return NULL;
31                         }
32                         deleteTree(current->children[0]);
33                         current->children[0] = fun;
34                         break;
35                     }
36                     current=betaReduction(current);
37                     *previous = current;
38                     break;
39                 }else if(current->kind==PrimiK) {  // primitive application
40                     // only perform primitive operation if operands are constants
41                     if(current->children[0]->kind==ConstK
42                         && current->children[1]->kind==ConstK) {
43                         TreeNode* tmp  = evalPrimitive(current);
44                         deleteTree(current);
45                         current = tmp;
46                         *previous = current;
47                         break;
48                     } else {
49                         fprintf(errOut, "Error: %s can only be applied on constants.\n", current->name);
50                         deleteTree(state);
51                         return NULL;
52                     }
53                 }else {
54                     fprintf(errOut,"Error: Cannot evaluate unkown expression kind.\n");
55                     deleteTree(state);
56                     return NULL;
57                 }
58             }
59         }
60         #ifdef DEBUG
61         // print intermediate steps
62         if(!isValue(state)) {
63             fprintf(out,"-> ");
64             printExpression(state,out);
65             fprintf(out,"\n");
66         }
67         #endif
68     }
69     return state;
70 }

In the code, variable state keeps the intermediate expression during the evaluation, so it is used to print the intermediate steps. Variable current points to the subexpression we are currently working with, and previous is a pointer to the current pointer, so after a beta-reduction or primitive evaluation is performed on the current subexpression, we can make the parent of current point to the new expression by changing the value of previous. The state and previous variable save part of the evaluation context information, and if others are needed, such as parent expression of the redex, they can be added easily. There are two loops in the code. The outer one runs until the evaluation result is a value, and the inner one implements the post-order traversal to find and evaluate the redex. The isValue() function tests if an expression is a value, which is identifier, constant or abstraction for now. The resolveFunction() function finds a predefined function from libraries in the order of builtin library and standard library. After each step, the intermediate expression is printed if the DEBUG flag is set.

Let’s try a simple example:

$ ./main
Welcome to Lambda Calculus Evaluator.
Press Ctrl+C to quit.

> (lambda x * 2 x) ((lambda x x) 3)
-> (lambda x * 2 x) 3
-> * 2 3
-> (lambda x (lambda y x `*` y)) 2 3
-> (lambda y 2 `*` y) 3
-> 2 `*` 3
-> 6

Textual Machine

In a real computer, a set of registers is used to keep the computation states, such as PC, and a list of instructions like ADD and MOV can be performed on the state to transform the machine to a new state. Our evaluator can be considered as a machine too. In contrast, the state is the expression, and the instructions are the reductions that transform expressions to expressions. We call this a textual machine, because expressions are represented as texts. This is the first textual machine we have implemented, and to improve it, a few more will be introduced in later posts.