2011年12月13日 星期二

Domain name changed

My current minjiezha.info domain name will be expired soon. I decided to use a new one minjiezha.com, and it will take effect in 24 hours.

2011年7月28日 星期四

Lazy CEK Machine

In the CEK Machine, though the substitution for an identifier is delayed, it doesn't mean it is a lazy machine. As described in Eager and Lazy Evaluation, a lazy evaluation strategy doesn't evaluate an argument before applying it to a function. However, in the CEK machine, it always evaluates the argument to a value, so it is still a call-by-value machine. If you are careful enough, you may already notice whenever an environment is created, the expression in its closure is always a value. The real power of environment is working with lazy evaluation. So in this post, I am going to implement the call-by-name and call-by-need strategies for the CEK machine.

Call by Name
It is quite easy to implement the call-by-name strategy based on the current call-by-value one. Since the machine no longer evaluates the argument before applying it to a function, the reduction rules related to applications should be modified(For a primitive operator, the operands are needed at the moment it is evaluated, so it stays unchanged.). In the call-by-value machine, when the control string is an application, first an ArgKK continuation is handled, then a FunKK continuation is handled. Now, the two steps, 2.1 and 2.2, should be combined into one, and the FunKK continuation type is removed.

2) If the control string is a value(abstraction or constant), check the type of continuation:
    2.1) If it is ArgKK, pop up the continuation, create a new environment that maps the function parameter to current closure, and set the function body and the new environment as the current expression and environment respectively;

However, the machine may need to evaluate the same argument many times now. Take expression (lambda x  + x x) ((lambda y y) 1) as an example, x is mapped to ((lambda y y) 1) after reducing the outer application. Then, when evaluating + x x, ((lambda y y) 1) is evaluated twice. The reason is that a parameter may be used several times in a function body, and whenever it is needed, the original argument it maps to should be evaluated. What if remember the result of the argument for the first time it is evaluated, so it doesn't bother to re-compute it later? I will talk this in next section.

Call by Need
The idea is quite straightforward. Remember the value for a closure, so next time the closure is needed, the value is retrieved directly. Considering the same example (lambda x + x x) ((lambda y y) 1), when the first x in + x x is needed, ((lambda y y) 1) is evaluated and the result is 1. After this, the environment which maps x to ((lambda y y) 1) is updated so x is mapped to 1. When the second x is needed, 1 is retrieved from the environment and no extra evaluation is performed.

However, the implementation is not that simple. When evaluating ((lambda y y) 1), I can't simply put it as the control string of current machine, because this may destroy the machine state and the machine can't go back to previous state after it is evaluated. Instead, a separate evaluation process should be performed on it. This is kind of a sub-machine, whose evaluation logic is the same, but it relies on some states from the parent machine, such as environments because the expression itself may contain free variables. In current evaluate() function, it initializes the machine state at the beginning and clean it up at the end. It was designed like this because no sub-machine was ever needed before. I am going to refactor it so it is fed with a machine state and focuses on the reduction logic, instead of managing the lifecycle of the machine state. One advantage is an expression can be put into a particular context for evaluation. It is more flexible. The interface is changed to:

// file: eval.c
static int _evaluate(State *state);

TreeNode * evaluate(TreeNode *expr) {
    Environment *globalEnv = buildGlobalEnvironment();
    State *state = cek_newState();
    state->closure = cek_newClosure(expr,globalEnv);

    TreeNode* result = NULL;
    if(_evaluate(state)) {
        // The result control string may contain free variables, need to 
        // substitute them using the environment for it.
        TreeNode *tmp = resolveFreeVariables(state->closure->expr,state->closure->env);
        if(tmp!=NULL) {
            // actually, tmp and state->closure->expr are the same expression
            result = tmp;
            state->closure->expr = NULL;
        }
    }
    cek_cleanup(state);
    return result;
}

And the code for looking up an identifier is:

// file: eval.c
static Closure* lookupVariable(const char *name, Environment *env) {
    if(env==NULL) return NULL;
    if(strcmp(name,env->name)==0) {
        Closure *closure = env->closure;
        if(closure->env!=NULL) {
            // evaluates the expression in its environment
            State *state = cek_newState();
            state->closure = cek_newClosure(duplicateTree(closure->expr),cek_newEnvironment("",cek_newClosure(NULL,NULL),closure->env));

            if(_evaluate(state)) {
                deleteTree(closure->expr);
                cek_deleteClosure(closure);
                // save the evaluated result
                env->closure = cek_newClosure(state->closure->expr,state->closure->env);
                state->closure->expr = NULL;
            }
            cek_cleanup(state);
        }
        return env->closure;
    }
    return lookupVariable(name,env->parent);
}

More Words about Closure and Environment
Another design improvement I want to mention is the global environment. Currently, if the control string is an identifier, the machine first tries to look it up in the environments and if not found it tries to resolve it from the builtin and standard libraries. Since a library maps identifiers to expressions, it is just like an environment. So the identifier lookup can be treated universally by introducing a global environment, which contains all the identifiers and functions provided by the evaluator. It is the root environment for all environments that are created during the evaluation and it is initialized when the machine is started. Thus, to handle a control string which is an identifier, the machine only needs to look it up in the environments. If it is a builtin or standard function, the machine can finally find it in the global environment.

A closure is an expression associated with an environment. The environment is where the expression is defined rather than it is evaluated. Let's consider the reduction steps of expression (lambda x  (lambda y (lambda x + x y) 2) x) 1.

(lambda x (lambda y (lambda x + x y) 2) x) 1
  => (lambda y (lambda x + x y) 2) x)                    A = {x=>(1,G), G}
  => (lambda x + x y) 2                                          B = {y=>(x,A), A}
  => + x y                                                               C = {x=>(2,B), B}
  => + 2 y                                                               C = {x=>(2,B), B}
  => + 2 1                                                               C = {x=>(2,B), B}  *
  => 3

The result is 3 rather than 4, because at step (*), y is mapped to x, and x is looked up in environment A, where it is defined, but not the latest environment C where is evaluated.

You can get the source code in branches call-by-name and call-by-need via github.

2011年7月23日 星期六

CEK Machine

In the textual machines I mentioned in previous posts, when reducing an application, without exception, everyone requires the substitution of an argument in place of all the occurences of an identifier in the abstraction body. The machine traverses the expression tree during substitution, and if the result is not a value, it needs to re-traverse it again to evaluate it. Take (lambda x + x x) 1 as an example, + x x is traversed to replace x with 1. After that, we get result + 1 1. To evaluate it, the expression + 1 1 is traversed again. It is inefficient to traverse an expression twice. We cannot get rid of this if we stick on the substitution when reducing an application. We can, however, remember the identifier and the argument at the moment and delay the substitution until it is actually needed.

To represent this delay substitution, a component that maps an identifier to an expression is needed. It is called an environment. Every expression in the machine is replaced by a pair, whose first part is an expression probably with free variables in it and the other part is an environment. It is called a closure. Of course, the environment could not map identifiers to expressions, because the machine never works on a single expression. It should map identifiers to closures. And the continuation also contains a closure. Now, the machine state changes to a control string, an environment and a continuation, so it is called a CEK machine. Here is the data structures introduced for environment and closure:

// file: cek_machine.h
struct envStruct {
    char *name;
    struct closureStruct *closure;
    struct envStruct *parent;
};

struct closureStruct {
    TreeNode *expr;
    struct envStruct *env;
};

typedef struct continuationStruct {
    ContinuationKind tag;
    Closure * closure;
    struct continuationStruct * next;
} Continuation;

Conceptually, an environment can contain several identifier-closure mappings. However, in the definition above an environment only has one such mapping. Since mappings are created when reducing applications and an abstraction only has one parameter, I keep one mapping in a standalone environment and arrange them hierarchically. When looking up for the mapped closure of an identifier, first try it in its associated environment, and if failed, try it in the parent one.

CEK Machine
There is no significant change to continuation actions. Some new actions, like creating a closure, deleting a closure, creating an environment, looking up in an environment, are very common in the reduction steps. By the way, an identifier is no long a value because it may be mapped to an expression, which can be further reduced. Here are the reduction rules:
1) If the control string is an identifier, look up the mapped closure in the environment, and set the expression and environment of the closure as current expression and environment respectively;
2) If the control string is a value(abstraction or constant), check the type of continuation:
    2.1) If it is FunKK, pop up the continuation, create a new environment that maps the function parameter to current closure, and set the function body and the new environment as the current expression and environment respectively;
    2.2) If it is ArgKK, set the continuation as FunKK, and switch the current expression and environment with these in the closure of continuation respectively;
    2.3) If it is OprKK, pop up the continuation, evaluate the primitive expression, and set the result as the current expression and set the current environment to null;
    2.4) If it is OpdKK, set next argument expression and its environment as current expression and environment respectively;
3) If the control string is an application, create and push a continuation of type ArgKK, whose closure consists of the argument expression and current environment, and set the abstractioin expression as current expression, with current environment unchanged;
4) If the control string is a primitive, create and push a continuation of type OpdKK, whose closure consists of the primitive expression and current environment, and set the first argument expression as current expression, with current environment unchanged.

Though implementation is straightforward according to the reduction rules, the code is getting complicated, so I am not duplicate it here. You can still access it on github.

A drawback of the CEK machine is no intermediate expression can be obtained after each step. What a pity! The machine always delays the substitution and expression is not actually reduced after each step. It does less work at the expense of losing immediate effects.

2011年6月10日 星期五

CK Machine

In CC and SCC machines, context expands and shrinks when traversing the expression tree in postorder. The nodes in the bottom of the tree are always evaluated before the upper ones. From another perspective, treating the expression as a text string, it is evaluated from innermost to outermost. In the case that the control string is an application, a new innermost application is pushed into the context. In the case that the control string is a value, the innermost application in the context is popped up and the machine decides what to do next by inspecting its shape. I used a first-in last-out list to save the contexts, because the reduction steps only depend on the innermost application, that is the latest context, but not on the rest of the data structure. Compared to the recursive implementation, this iterative evaluator doesn't accumulate runtime stack frames, but this data structure works as an explicit stack in the single runtime stack frame. If the control string is an application, a new "stack frame" is pushed. If it is a value, the top "stack frame" is popped up. Using the control string and information in the popped up "stack frame", the machine decides subsequent actions, performing a reduction or pushing the "stack frame" back. Here is the code snippet making the decision in SCC machine:
// File: eval.c line 92-106 

// control string is the right child
    if(state->controlStr==state->context->expr->children[1]) {
        // pop an expression from the context
        state->controlStr = state->context->expr;
        ctx = state->context;
        state->context = state->context->next;
        cc_deleteContext(ctx);
        ctx = NULL;
        if(!performReduction(state)) {
            return NULL;
        }
    } else {
        state->controlStr = state->context->expr->children[1];
    }
If the control string is the left child of the innermost application, which means it is the function part, the next action is to evaluate the argument. If it is the right child, which means it is the argument, the next action is to perform reduction on the innermost application. I have to inspect such relationship between the control string and innermost application because I didn't store it in the context, although it is available when the context was created. Look at the following code snippet:
// File: eval.c line 115-119

if(!isValue(state->controlStr->children[0])) {
    state->controlStr = state->controlStr->children[0];
}else {
    state->controlStr = state->controlStr->children[1];
}
The shape of the application was clearly known when setting the control string. At this phase, the machine was able to determine the actions once control string is evaluated. If the actions are saved in the newly created context, when popped up later the saved actions can be performed directly without inspecting the shape of the application. For ease of implementation, I will tag each context indicating what to do next when it is popped up. This tagged context is called continuation. The definition is:
// File: ck_machine.h
/* Kind of each continuation. */
typedef enum {
    FunKK, ArgKK, OprKK, OpdKK
} ContinuationKind;

/* Use a LIFO list to represent the continuation. */
typedef struct continuationStruct {
    ContinuationKind tag;
    TreeNode * expr;
    struct continuationStruct * next;
} Continuation;

The innermost application is kept in a continuation as expr. Four actions are defined, with the first two for applications and the other two for primitive applications. If its is FunKK, the next action is to perform reduction on the application; If it is ArgKK, the next action is to evaluate the argument expression; If it is OprKK, the next action is to evaluate the primitive application; If it is OpdKK, the next action is to evaluate the next argument of the primitive application.

CK Machine
Now, the state of the textual machine is changed to a pair of control string and continuation, and this machine is called a CK machine. The reduction rules are similar to that of CC/SCC machine, except for manipulations on continuations:
    1) If the control string is a value(identifier, constant or abstraction), check the type of continuation:
        1.1) If it is FunKK or OprKK, pop up the continuation, set the expression in it as control string, and perform reductions;
        1.2) If it is ArgKK, set the continuation to FunKK, and set the argument expression as control string;
        1.3) If it is OpdKK, set the continuation to OprKK, and set the next argument expression as control string;
    2) If the control string is an application, create and push a continuation of type ArgKK, and set the abstraction expression as control string;
    3) If the control string is an primitive, create and push a continuation of type OpdKK, and set the first argument as control string.

The primitive application only supports two arguments till now, so the next argument just means the second argument. Here is the code implementing the reduction rules:
// File: eval.c
TreeNode * evaluate(TreeNode *expr) {
    State * state = ck_newState();
    state->controlStr = expr;

    Continuation * ctn = NULL;
    while(!ck_canTerminate(state)) {
        if(isValue(state->controlStr)) {
            switch(state->continuation->tag) {
                case FunKK: // pop up the continuation
                case OprKK:
                    state->controlStr = state->continuation->expr;
                    ctn = state->continuation;
                    state->continuation = ctn->next;
                    ck_deleteContinuation(ctn);
                    ctn = NULL;
                    if(!performReduction(state)) {
                        return NULL;
                    }
                    break;
                case ArgKK: // change continuation to FunKK
                    state->continuation->tag = FunKK;
                    state->controlStr = state->continuation->expr->children[1];
                    break;
                case OpdKK: // change to OprKK
                    state->continuation->tag = OprKK;
                    state->controlStr = state->continuation->expr->children[1];
                    break;
                default:
                    fprintf(errOut,"Error: Unknown continuation tag.\n");
                    ck_cleanup(state);
                    return NULL;
                    
            }
        }else {
            if(state->controlStr->kind==AppK) { // push an ArgKK continuation
                ctn = ck_newContinuation(ArgKK);
            }else if(state->controlStr->kind==PrimiK) { // push OpdKK continuation
                ctn = ck_newContinuation(OpdKK);
            }
            ctn->expr = state->controlStr;
            ctn->next = state->continuation;
            state->continuation = ctn;
            state->controlStr = state->controlStr->children[0];
            ctn = NULL;
        }
        #ifdef DEBUG
            // print intermediate steps
            if(!ck_canTerminate(state)) {
                fprintf(out,"-> ");
                printExpression(ck_getProgram(state),out);
                fprintf(out,"\n");
            }
        #endif
    }

    TreeNode* result = state->controlStr;
    ck_deleteState(state);
    return result;
}

The implementation is quite simple but I want to add some words about the CC/SCC and CK machine. From the code and reduction rules, the CK machine looks quite similar to that of CC/SCC machine, and there is only a tiny difference, a tag, in the data structure for context and continuation. However, the ideas behind them are really distinct. For a context, it stores the innermost application expression, which can be thought as data. While for a continuation, besides such data, it also keeps the information for actions, which can be thought as code. In the above implementation, the continuation is attached with a tag, but for other implementations, functions or programs may be attached to make it more usable and flexible. Furthermore, the programmer can even provide a program as actions to continuations explicitly. Another significant difference is: with context, the applications can only be evaluated serially from innermost to outermost, because the relationship of the innermost application and control string is necessary to decide what the next action is. With continuation, this is not required because action is saved in the continuation, and each continuation itself can decide the next action. So in some circumstances, several continuations can be skipped and another continuation can be picked to continue the evaluation. This is essential to control flow and error handling. For example, if an error happens in an application, the machine should go to the function that handles the error, instead of evaluating the next expression in current application. I will go back to this topic in later posts.

2011年5月28日 星期六

CC and SCC Machine

An obvious inefficiency of our first textual machine is the repeated traversal of the expression tree to find the redex. After each reduction step, it starts over again from the root node. We really don't have to do this, because the next redex is often the reduction result of current redex, or at least, closes to the current one. Let's consider the evaluation of the following expression:
(+ ((lambda x (lambda y y) x) 1) 1)
    => (+ ((lambda y y) 1) 1)
    => (+ 1 1)
    => 2

The redex for each step is underlined. In the first two steps, the redex for next step is just the evaluation result of the current redex. And in the third step, the next redex is the parent expression of the result of previous step. Actually, we can identify the next redex by checking the result of current one or its parent for every expression. Suppose the evaluation result of a redex is an application, if either of its children is not a value, the non-value child expression will be the next redex. Otherwise, the result itself will be the next one. Suppose the result is a value, its parent or the other child will probably be the next redex. We didn't do this kind of checking in the textual machine, because we only had the complete expression as our machine state, and the machine were not aware of the result and its parent. Though this could be obtained in the implementation, from the model view, the machine does only know the complete expression. So in this post, two new machines using the strategy will be defined to improve the efficiency.

CC Machine
According to the discussion above, if the textual machine knows the redex and its parent, it can easily decide what the next redex is, rather than traversing the entire expression tree again. To keep it running until getting the final result, not just the parent of the redex, but also the parent's parent should be recorded. The redex is the machine currently interested in, and it is called control string. Except the redex, the other part of the expression is called evaluation context, from which all the parents can be obtained. The two elements are paired together to form the machine's state. So this machine is called CC machine. Combining the control string and context yields the complete expression, which is the state of the previous machine. I'd like to call the complete expression as program. Initially, the program is the control string, and the context is empty. The termination condition of this machine is when the control string is a value and context is empty. Here is the reduction rules for this machine:
    1) If the control string is a value(identifier, constant or abstraction), remove its parent from the context, and set the parent as control string;
    2) If the control string is an application or primitive, and at least one of its children is not a value, put itself into the context and set the non-value child as control string;
    3) If the control string is an application or primitive, and both of its children are values, perform reductions on it and set the result as control string.

To find the first redex, we still need to traverse the expression tree, but the traversed nodes should be remembered in the context. Since only the parent of control string is interested for each step, and from each traversed parent, we can navigate to any part of the program, I use a FILO(First-In-Last-Out) list of the traversed nodes to represent the context(Actually, we don't need such kind of data structure if a pointer to parent node is added to each node, but this requires changes to the parser, so I don't choose this option). From the implementation view, the machines perform evaluations by popping and pushing control string into or from the context to find the redex. Here is the definition for machine state and the context:
// file: cc_machine.h
/* Use a LIFO list to represent the context. */
typedef struct contextStruct {
    TreeNode * expr;
    struct contextStruct * next;
} Context;

/* Machine state is a pair of control string and the context. */
typedef struct stateStruct {
    TreeNode * controlStr;
    Context * context;
} State;


From the reductions rules, we can easily derive the actions on context:
    1) If the control string is a value, pop the head node from the list and set the node as control string;
    2) If the control string is an application or primitive, and at least one of its children is not a value, push it into the head of the list and set the non-value child as control string;
    3) If the control string is an application or primitive, and both of its children are values, perform reductions on it and set the result as control string and no action for context.

Here is the listing of code:
// file: eval.c
TreeNode * evaluate(TreeNode *expr) {
    State * state = cc_newState();
    state->controlStr = expr;

    Context * ctx = NULL;
    while(!cc_canTerminate(state)) {
        if(isValue(state->controlStr)) {
            // pop an expression from the context
            state->controlStr = state->context->expr;
            ctx = state->context;
            state->context = state->context->next;
            cc_deleteContext(ctx);
            ctx = NULL;
        }else {
            if(!isValue(state->controlStr->children[0])
                || !isValue(state->controlStr->children[1])) {
                // push the current expression into context
                ctx = cc_newContext();
                ctx->expr = state->controlStr;
                ctx->next = state->context;
                state->context = ctx;
                if(!isValue(state->controlStr->children[0])) {
                    state->controlStr = state->controlStr->children[0];
                }else {
                    state->controlStr = state->controlStr->children[1];
                }
            } else { // evaluate control string
                if(state->controlStr->kind==AppK) {
                    if(state->controlStr->children[0]->kind==ConstK) {
                        fprintf(errOut, "Error: cannot apply a constant to any argument.\n");
                        fprintf(errOut, "\t\t");
                        printExpression(state->controlStr,errOut);
                        cc_cleanup(state);
                        return NULL;
                    }else if(state->controlStr->children[0]->kind==IdK) {
                        // find function from builtin and standard library
                        TreeNode* fun = resolveFunction(state->controlStr->children[0]->name);
                        if(fun==NULL) {
                            fprintf(errOut, "Error: %s is not a predefined function.\n", state->controlStr->children[0]->name);
                            cc_cleanup(state);
                            return NULL;
                        }
                        deleteTree(state->controlStr->children[0]);
                        state->controlStr->children[0] = fun;
                    } else {
                        TreeNode *tmp = betaReduction(state->controlStr);
                        if(state->context!=NULL) {
                            if(state->context->expr->children[0]==state->controlStr) {
                                state->context->expr->children[0] = tmp;
                            }else {
                                state->context->expr->children[1] = tmp;
                            }
                        }
                        state->controlStr = tmp;
                    }
                }else if(state->controlStr->kind==PrimiK) {
                    // only perform primitive operation if operands are constants
                    if(state->controlStr->children[0]->kind==ConstK
                        && state->controlStr->children[1]->kind==ConstK) {
                        TreeNode* tmp  = evalPrimitive(state->controlStr);
                        if(state->context!=NULL) {
                            if(state->context->expr->children[0]==state->controlStr) {
                                state->context->expr->children[0] = tmp;
                            } else {
                                state->context->expr->children[1] = tmp;
                            }
                        }
                        deleteTree(state->controlStr);
                        state->controlStr = tmp;
                    } else {
                        fprintf(errOut, "Error: %s can only be applied on constants.\n", state->controlStr->name);
                        cc_cleanup(state);
                        return NULL;
                    }
                }else {
                    fprintf(errOut,"Error: Cannot evaluate unkown expression kind.\n");
                    cc_cleanup(state);
                    return NULL;

                }
            }
        }
        #ifdef DEBUG
            // print intermediate steps
            if(!cc_canTerminate(state)) {
                fprintf(out,"-> ");
                printExpression(cc_getProgram(state),out);
                fprintf(out,"\n");
            }
        #endif
    }

    TreeNode* result = state->controlStr;
    cc_deleteState(state);
    return result;
}
Let's try it:
$ ./main
Welcome to Lambda Calculus Evaluator.
Press Ctrl+C to quit.

> (lambda x x) (lambda y y) 1
-> (lambda x x) (lambda y y) 1
-> (lambda y y) 1
-> (lambda y y) 1
-> 1

SCC Machine
Examining the reduction steps at the end of the last section, step 2 and 3 are the same. That's because after step 2, the control string is (lambda y y) and the context is the parent of it. Based on the rules, the machine pops the parent and set it as control string. The machine state is changed but not actual reduction is performed on the program, so the program for both steps are the same. After that, beta-reduction is performed on the control string. Actually, we only have two possible actions after step 2: performing beta-reduction or primitive evaluation on the parent node or setting the other child as control string. We can simplify the process by checking the parent node so those two steps can be combined into one. Rule 1) of the CC machine is now revised to:
    1') If the control string is a value and the other child of its parent is a value too, remove its parent from the context, perform reductions to it and set the result as control string. Otherwise, set the other child as control string.

The code needs only simple changes, so I won't duplicate it here. Here is the evaluation steps for the same expression, and the duplicated step is gone:
$ ./main
Welcome to Lambda Calculus Evaluator.
Press Ctrl+C to quit.

> (lambda x x) (lambda y y) 1
-> (lambda x x) (lambda y y) 1
-> (lambda y y) 1
-> 1

Compared to CC machine, SCC machile always has less steps. All the code is available at here.