Herb is the author of more than two dozen computer books, with topics ranging from C to Modula-2. This article is based on his book Born to Code in C (Osborne/McGraw-Hill). He can be reached at RR #1, Box 130, Mahomet, IL 61853.
In this article, I develop a C interpreter that can execute a subset of K&R ANSI C. The interpreter not only is functional as presented, but is designed so that you can easily enhance and extend it. In fact, you can even add features not found in ANSI C if you choose. By the time you finish reading this article, you’ll have a C interpreter you can use and enlarge, and you will have gained considerable insight into the structure of the C language itself.
Although ANSI C has only 32 keywords, it is a rich and powerful language. It would, of course, take far more than a single article to fully describe and implement an interpreter for the entire C language. Instead, this C interpreter understands a fairly narrow subset of the language. Table 1 lists the features that are implemented in it.
Parameterized functions with local variables
In order to simplify and shorten the source code to the interpreter, I have imposed one small restriction on the C grammar: The targets of the if, while, do, and for must be blocks of code surrounded by beginning and ending curly braces. You may not use a single statement. For example, my interpreter will not correctly interpret code in Example 1; instead, you must write code like that in Example 2. Because the objects of the program control statements are often blocks of code anyway, this restriction does not seem too harsh. (With a little effort, though, you can remove this restriction.)
for(a=0; a<10; a=a+1) for(b=0; b<10; b=b+1) for(c=0; c<10; c=c+1) puts("hi");
for(a=0; a<10; a=a+1) { for(b=0; b<10; b=b+1) { for(c=0; c<10; c=c+1) { puts("hi"); } } }
To create the C interpreter, we first construct the expression evaluator and the piece of code that reads and analyzes expressions, the expression parser.
There are several different ways to design an expression parser for C. Many commercial compilers use a table-driven parser that is generally created by a parser-generator program. While table-driven parsers are usually faster than other methods, they are hard to create by hand. For this C interpreter I will use a recursive-descent parser, which is essentially a collection of mutually recursive functions that process an expression. If the parser is used in a compiler, then its function is to generate the proper object code that corresponds to the source code. In an interpreter, however, the object of the parser is to evaluate a given expression.
Fundamental to all interpreters (and compilers, for that matter) is a special function that reads the source code and returns the next logical symbol from it. For historical reasons, these logical symbols are generally referred to as tokens, and computer languages in general (and C in particular) define programs in terms of tokens. This C interpreter categorizes tokens as shown in Table 2. The function that returns tokens from the source code for the C interpreter is called get_token() and is shown in Listing One, lines 313 through 423. The get_token() function uses the global data and enumeration types in Figure 1.
char *prog; /* points to current location in source code */ extern char *p_buf; /* points to start of program buffer */ char token[80]; /* holds string representation of token */ char token_type; /* contains the type of token*/ char tok; /* holds the internal representation of token if it is a keyword */ enum tok_types {DELIMITER, IDENTIFIER, NUMBER, KEYWORD, TEMP, STRING, BLOCK}; enum double_ops {LT=1, LE, GT, GE, EQ, NE}; /* These are the constants used to call sntx_err( ) when a syntax error occurs. Add more if you like. NOTE: SYNTAX is a generic error message used when nothing else seems appropriate. */ enum error_msg {SYNTAX, UNBAL_PARENS, NO_EXP, EQUALS_EXPECTED, NOT_VAR, PARAM_ERR, SEMI_EXPECTED, UNBAL_BRACES, FUNC_UNDEF, TYPE_EXPECTED, NEST_FUNC, RET_NOCALL, PAREN_EXPECTED, WHILE_EXPECTED, QUOTE_EXPECTED, NOT_TEMP, TOO_MANY_LVARS};
Token Type Includes delimiters punctuation and operators keywords keywords strings quoted strings identifiers variable and function names number numeric constant block {or}
The current location in the source code is pointed to by prog. The p_buf pointer is unchanged by the interpreter and always points to the start of the program being interpreted. The get_token( ) function begins by skipping over all white space, including carriage returns and line feeds. Because no C token (except for a quoted string or character constant) contains a space, spaces must be bypassed. The get_token( ) function also skips over comments. Next, the string representation of each token is placed into token, its type (as defined by the tok_types enumeration) is put into token_type, and, if the token is a keyword, its internal representation is assigned to tokvia the look_up( ) function (shown in the full parser listing, Listing One). You can see by looking at get_token( ) that it converts C’s two-character relational operators into their corresponding enumeration value. Although not technically necessary, this step makes the parser easier to implement. Finally, if the parser encounters a syntax error, it calls the function sntx_err( ) with an enumerated value that corresponds to the type of error found. The sntx_err( ) function is also called by other routines in the interpreter whenever an error occurs. The sntx_err() function is shown here in Listing One, lines 274 through 311.
The entire code for the C interpreter recursive descent parser is shown in Listing One (PARSER.C), along with some necessary support functions, global data, and data types. As listed, this code is designed to go into its own file.
The atom( ) function and the functions that begin with eval_exp implement the production rules for C expressions. To verify this, you might want to mentally execute the parser using a simple expression.
The atom( ) function finds the value of an integer constant or variable, a function, or a character constant. The two kinds of functions that may be present in the source code are user-defined or library. If a user-defined function is encountered, its code is executed by the interpreter in order to determine its value. (The calling of a function will be discussed in the next section.) If the function is a library function, however, then its address is first looked up by the internal_func( ) function and is then accessed via its interface function. The library functions and the addresses of their interface functions are held in the internal_func array in lines 56 through 73.
The heart of the C interpreter is developed in this section. Before the interpreter can actually start executing a program, however, a few clerical tasks must be performed. Some method must be devised to allow execution to begin at the right spot, and all global variables must be known and accounted for before main( ) begins executing. It is important, though not technically necessary, that the location of each function defined in the program be known so that a call to a function can be as fast as possible. If this step is not performed, a lengthy sequential search of the source code will be needed to find the entry point to a function each time it is called.
The solution to these problems is the interpreter prescan. Prescanners (or preprocessors, as they are sometimes called, though they have little resemblance to a C compiler’s preprocessor) are used by many commercial interpreters regardless of what language they are interpreting. A prescanner reads the source code to the program before it is executed and performs whatever tasks can be done prior to execution. In this C interpreter, the prescanner performs two important jobs: First, it finds and records the location of all user-defined functions including main( ). Second, it finds and allocates space for all global variables. Here, the function that performs the prescan is called prescan( ) and is shown in Listing Two, LITTLEC.C, lines 190 through 228.
Global variables are stored in a global variable table called global_vars by decl_global( ), lines 43 through 50 and lines 240 through 253. The integer gvar_index(line 76) holds the location of the next free element in the array. The location of each user-defined function is put into the func_table array, shown in lines 52 through 55. The func_index variable (line 75) holds the index of the next free location in the table.
The main( ) function to the C interpreter, shown in lines 92 through 118, loads the source code, initializes the global variables, calls prescan( ), primes the interpreter for the call to main( ), and then executes call( ), which begins execution of the program.
The interp_block( ) function is the heart of the interpreter. This function decides what action to take based upon the next token in the input stream. It is designed to interpret one block of code and then return. The interp_block( ) function is shown in lines 119 through 174.
Ignoring calls to functions like exit( ), a program ends when the last curly brace (or a return) in main( ) is encountered. It does not necessarily end with the last line of source code. This is one reason that interp_block( ) executes only a block of code, and not the entire program. Another reason is that, conceptually, C consists of blocks of code. Therefore, interp_block( ) is called each time a new block of code is encountered. This includes both function calls and blocks begun by various C statements, such as an ifblock. This means that the interpreter may call interp_block( ) recursively in the process of executing a program.
When the interpreter encounters an int or char keyword, it calls decl_local( ) to create storage for a local variable. Each time a local variable is encountered, its name, type, and value (initially zero) are pushed onto the stack using local_push( ). The global variable lvartos indexes the stack. (There is no corresponding “pop” function. Instead, the local variable stack is reset each time a function returns.) The decl_local and local_push( ) functions are shown in lines 254 through 353.
All function calls (except the initial call to main( )) take place through the expression parser from the atom( ) function by a call to call( ). It is the call( ) function that actually handles the details of calling a function. The call( ) function is shown in Listing Two, lines 269 through 337, along with two support functions.
Let’s return to the expression parser for a moment. When an assignment statement is encountered, the value of the right side of the expression is computed and assigned to the variable on the left using a call to assign( ). Given a program like the one in Example 3, how does the assign( ) function know which variable is being assigned a value in each assignment? The answer is simple: First, in C, local variables take priority over global variables of the same name. Second, local variables are not known outside their own function. To see how we can use these rules to resolve the above assignments, examine the assign( ) function shown in lines 369 through 388.
int count; main () { int count; count = 100; f(); } f() { int count; count = 99; }
Now that the basic structure of the C interpreter is in place, it is time to add some control statements. Each time a keyword statement is encountered inside of interp_block( ), an appropriate function is called, which processes that statement. One of the easiest is the if. The if statement is processed by exec_if( ), shown in lines 419 through 438.
Like the if, it is easy to interpret a while loop. The function that actually performs this task is called exec_while( ) and is shown in lines 439 through 454.
A do/while loop is processed much like the while. When interp_block( ) encounters a do statement, it calls exec_do( ), shown in lines 455 through 469.
The main difference between the do/ while and the while loops is that the do/while always executes its block of code at least once because the conditional expression is at the bottom of the loop. Therefore, exec_do( ) first saves the location of the top of the loop into temp and then calls interp_block( ), recursively, to interpret the block of code associated with the loop. When interp_block( ) returns, the corresponding while is retrieved and the conditional expression is evaluated. If the condition is true, prog is reset to the top of the loop; otherwise, execution continues.
The interpretation of the for loop poses a more difficult challenge than the other constructs. This is partly because the structure of the C for is definitely designed with compilation in mind. The main trouble is that the conditional expression of the for must be checked at the top of the loop, but the increment portion occurs at the bottom of the loop. Therefore, even though these two pieces of the for loop occur next to each other in the source code, their interpretation is separated by the block of code being iterated. With a little work, however, the for can be correctly interpreted.
When interp_block( ) encounters a for statement, exec_for( ) is called. This function is shown in lines 482 through 514.
Because the C programs executed by the interpreter are never compiled and linked, any library routines they use must be handled directly by the interpreter. The best way to do this is to create an interface function, which the C interpreter calls when a library function is encountered. This interface function sets up the call to the library function and handles any return values.
Due to space limitations, the interpreter contains only five library functions: getche( ), putch( ), puts( ), print( ), and getnum( ). Of these, only puts( ) is described by the ANSI standard, and it outputs a string to the screen. The putch( ) function is a common extension to C for interactive environments. It waits for and returns a key struck at the keyboard. Unlike most implementations of putchar( ) it does not line-buffer input. This function is found in many compilers, such as Turbo C, QuickC, and Lattice C. The putch( ) is also defined by many compilers designed for use in an interactive environment. It outputs a single character argument to the console and does not buffer output.
The functions getnum( ) and print( ) are my own creations. The getnum( ) function returns the integer equivalent of a number entered at the keyboard. The print( ) function is a handy function that can output either a string or an integer argument to the screen. It is an example of a function that would be difficult to implement in a compiled environment, but is easy to create for an interpreted one. The five library functions are shown in Table 3 in their prototype forms. The C interpreter library routines are listed in Listing Three (LCLIB.C).
int getche(void); /* read a character from keyboard and return its value */ int putch(char ch); /* write a character to the screen */ int puts(char*s); /* write a string to the screen */ int getnum(void); /* read an integer from the keyboard and return its value */ int print(char*s); /* write a string to the screen */ or int print(int i); /* write an integer to the screen */
To add additional library functions, first enter their names and addresses of their interface functions into the internal func( ) array. Next, following the lead of the functions shown here, create appropriate interface functions.
Once you have compiled all three files that make up this interpreter, compile and link them together. If you use Turbo C, you can use a sequence like this:
tcc -c parser.c tcc -c lclib.c tcc littlec.c parser.obj lclib.obj
If you use Microsoft C, use this sequence:
cl -c parser.c cl -c lclib.c cl littlec.c parser.obj lclib.obj
If you use a different C compiler, follow the instructions that come with it.
The program in Listing Four demonstrates the various features of my C.
The C interpreter presented here was designed with transparency of operation in mind. The goal was to develop an interpreter that could be easily understood with the least amount of effort, and to design it so that it could be easily expanded. As such, the interpreter is not particularly fast or efficient. The basic structure of the interpreter is correct, however, and you can increase its speed of execution in the following ways.
One potential improvement is with the lookup routines for variables and functions. Even if you convert these items into integer tokens, the current approach to searching for them relies upon a sequential search. You could, however, substitute some other, faster method, such as a binary tree or some sort of hashing method.
Two general areas in which you can expand and enhance the C interpreter are C features and ancillary features. Among the C statements you can add to the interpreter are additional action statements, such as the switch, the goto, and the break and continue statements.
Another type of C statement you can add is new data types. The interpreter already contains the basic hooks for additional data types. For example, the var_type structure already contains a field for the type of variable. To add other elementary types (that is, float, double, long), increase the size of the value field to the size of the largest element you wish to hold.
Supporting pointers is no harder than supporting any other data type; however, you will need to add support for the pointer operators to the expression parser. This will involve some lookahead. Once you have implemented pointers, arrays will be easy. The space of an array should be allocated dynamically using malloc( ), and a pointer to the array should be stored in the value field of var_type.
The addition of structures and unions poses a more difficult problem. The easiest way to handle them is to use malloc( ) to allocate space for them, and to use a pointer to the object in the value field of the var_type structure. (You will also need special code to handle the passing of structures and unions as parameters.) To handle different return types for functions, add a type field to the func_type structure, which defines what type of data a function returns.
All source code for articles in this issue is available on a single disk. To order, send .95 (Calif. residents add sales tax) to Dr. Dobb’s Journal, 501 Galveston Dr., Redwood City, CA 94063, or call 800-356-2002 (from inside Calif.) or 800-533-4372 (from outside Calif.). Please specify the issue number and format (MS-DOS, Macintosh, Kaypro).
BUILDING YOUR OWN C INTERPRETER by Herbert Schildt
[LISTING ONE]
/* Recursive descent parser for integer expressions which may include variables and function calls. */ #include "setjmp.h" #include "math.h" #include "ctype.h" #include "stdlib.h" #include "string.h" #include "stdio.h" #define NUM_FUNC 100 #define NUM_GLOBAL_VARS 100 #define NUM_LOCAL_VARS 200 #define ID_LEN 31 #define FUNC_CALLS 31 #define PROG_SIZE 10000 #define FOR_NEST 31 enum tok_types {DELIMITER, IDENTIFIER, NUMBER, KEYWORD, TEMP, STRING, BLOCK}; enum tokens {ARG, CHAR, INT, IF, ELSE, FOR, DO, WHILE, SWITCH, RETURN, EOL, FINISHED, END}; enum double_ops {LT=1, LE, GT, GE, EQ, NE}; /* These are the constants used to call sntx_err() when a syntax error occurs. Add more if you like. NOTE: SYNTAX is a generic error message used when nothing else seems appropriate. */ enum error_msg {SYNTAX, UNBAL_PARENS, NO_EXP, EQUALS_EXPECTED, NOT_VAR, PARAM_ERR, SEMI_EXPECTED, UNBAL_BRACES, FUNC_UNDEF, TYPE_EXPECTED, NEST_FUNC, RET_NOCALL, PAREN_EXPECTED, WHILE_EXPECTED, QUOTE_EXPECTED, NOT_TEMP, TOO_MANY_LVARS}; extern char *prog; /* current location in source code */ extern char *p_buf; /* points to start of program buffer */ extern jmp_buf e_buf; /* hold environment for longjmp() */ /* An array of these structures will hold the info associated with global variables. */ extern struct var_type { char var_name[32]; enum variable_type var_type; int value; } global_vars[NUM_GLOBAL_VARS]; /* This is the function call stack. */ extern struct func_type { char func_name[32]; char *loc; /* location of function entry point in file */ } func_stack[NUM_FUNC]; /* Keyword table */ extern struct commands { char command[20]; char tok; } table[]; /* "Standard library" functions are declared here so they can be put into the internal function table that follows. */ int call_getche(void), call_putch(void); int call_puts(void), print(void), getnum(void); struct intern_func_type { char *f_name; /* function name */ int (* p)(); /* pointer to the function */ } intern_func[] = { "getche", call_getche, "putch", call_putch, "puts", call_puts, "print", print, "getnum", getnum, "", 0 /* null terminate the list */ }; extern char token[80]; /* string representation of token */ extern char token_type; /* contains type of token */ extern char tok; /* internal representation of token */ extern int ret_value; /* function return value */ void eval_exp(int *value), eval_exp1(int *value); void eval_exp2(int *value); void eval_exp3(int *value), eval_exp4(int *value); void eval_exp5(int *value), atom(int *value); void eval_exp0(int *value); void sntx_err(int error), putback(void); void assign_var(char *var_name, int value); int isdelim(char c), look_up(char *s), iswhite(char c); int find_var(char *s), get_token(void); int internal_func(char *s); int is_var(char *s); char *find_func(char *name); void call(void); /* Entry point into parser. */ void eval_exp(int *value) { get_token(); if(!*token) { sntx_err(NO_EXP); return; } if(*token==';') { *value = 0; /* empty expression */ return; } eval_exp0(value); putback(); /* return last token read to input stream */ } /* Process an assignment expression */ void eval_exp0(int *value) { char temp[ID_LEN]; /* holds name of var receiving the assignment */ register int temp_tok; if(token_type==IDENTIFIER) { if(is_var(token)) { /* if a var, see if assignment */ strcpy(temp, token); temp_tok = token_type; get_token(); if(*token=='=') { /* is an assignment */ get_token(); eval_exp0(value); /* get value to assign */ assign_var(temp, *value); /* assign the value */ return; } else { /* not an assignment */ putback(); /* restore original token */ strcpy(token, temp); token_type = temp_tok; } } } eval_exp1(value); } /* This array is used by eval_exp1(). Because some compilers cannot initialize an array within a function it is defined as a global variable. */ char relops[7] = { LT, LE, GT, GE, EQ, NE, 0 }; /* Process relational operators. */ void eval_exp1(int *value) { int partial_value; register char op; eval_exp2(value); op = *token; if(strchr(relops, op)) { get_token(); eval_exp2(&partial_value); switch(op) { /* perform the relational operation */ case LT: *value = *value < partial_value; break; case LE: *value = *value <= partial_value; break; case GT: *value = *value > partial_value; break; case GE: *value = *value >= partial_value; break; case EQ: *value = *value == partial_value; break; case NE: *value = *value != partial_value; break; } } } /* Add or subtract two terms. */ void eval_exp2(int *value) { register char op; int partial_value; eval_exp3(value); while((op = *token) == '+' || op == '-') { get_token(); eval_exp3(&partial_value); switch(op) { /* add or subtract */ case '-': *value = *value - partial_value; break; case '+': *value = *value + partial_value; break; } } } /* Multiply or divide two factors. */ void eval_exp3(int *value) { register char op; int partial_value, t; eval_exp4(value); while((op = *token) == '*' || op == '/' || op == '%') { get_token(); eval_exp4(&partial_value); switch(op) { /* mul, div, or modulus */ case '*': *value = *value * partial_value; break; case '/': *value = (*value) / partial_value; break; case '%': t = (*value) / partial_value; *value = *value-(t * partial_value); break; } } } /* Is a unary + or -. */ void eval_exp4(int *value) { register char op; op = ''; if(*token=='+' || *token=='-') { op = *token; get_token(); } eval_exp5(value); if(op) if(op=='-') *value = -(*value); } /* Process parenthesized expression. */ void eval_exp5(int *value) { if((*token == '(')) { get_token(); eval_exp0(value); /* get subexpression */ if(*token != ')') sntx_err(PAREN_EXPECTED); get_token(); } else atom(value); } /* Find value of number, variable or function. */ void atom(int *value) { int i; switch(token_type) { case IDENTIFIER: i = internal_func(token); if(i!= -1) { /* call "standard library" function */ *value = (*intern_func[i].p)(); } else if(find_func(token)){ /* call user-defined function */ call(); *value = ret_value; } else *value = find_var(token); /* get var's value */ get_token(); return; case NUMBER: /* is numeric constant */ *value = atoi(token); get_token(); return; case DELIMITER: /* see if character constant */ if(*token=='\'') { *value = *prog; prog++; if(*prog!='\'') sntx_err(QUOTE_EXPECTED); prog++; get_token(); } return; default: if(*token==')') return; /* process empty expression */ else sntx_err(SYNTAX); /* syntax error */ } } /* Display an error message. */ void sntx_err(int error) { char *p, *temp; int linecount = 0; register int i; static char *e[]= { "syntax error", "unbalanced parentheses", "no expression present", "equals sign expected", "not a variable", "parameter error", "semicolon expected", "unbalanced braces", "function undefined", "type specifier expected", "too many nested function calls", "return without call", "parentheses expected", "while expected", "closing quote expected", "not a string", "too many local variables" }; printf("%s", e[error]); p = p_buf; while(p != prog) { /* find line number of error */ p++; if(*p == '\r') { linecount++; } } printf(" in line %d\n", linecount); temp = p; for(i=0; i<20 && p>p_buf && *p!='\n'; i++, p--); for(i=0; i<30 && p<=temp; i++, p++) printf("%c", *p); longjmp(e_buf, 1); /* return to save point */ } /* Get a token. */ get_token(void) { register char *temp; token_type = 0; tok = 0; temp = token; *temp = ''; /* skip over white space */ while(iswhite(*prog) && *prog) ++prog; if(*prog=='\r') { ++prog; ++prog; /* skip over white space */ while(iswhite(*prog) && *prog) ++prog; } if(*prog=='') { /* end of file */ *token = ''; tok = FINISHED; return(token_type=DELIMITER); } if(strchr("{}", *prog)) { /* block delimiters */ *temp = *prog; temp++; *temp = ''; prog++; return (token_type = BLOCK); } /* look for comments */ if(*prog=='/') if(*(prog+1)=='*') { /* is a comment */ prog += 2; do { /* find end of comment */ while(*prog!='*') prog++; prog++; } while (*prog!='/'); prog++; } if(strchr("!<>=", *prog)) { /* is or might be a relation operator */ switch(*prog) { case '=': if(*(prog+1)=='=') { prog++; prog++; *temp = EQ; temp++; *temp = EQ; temp++; *temp = ''; } break; case '!': if(*(prog+1)=='=') { prog++; prog++; *temp = NE; temp++; *temp = NE; temp++; *temp = ''; } break; case '<': if(*(prog+1)=='=') { prog++; prog++; *temp = LE; temp++; *temp = LE; } else { prog++; *temp = LT; } temp++; *temp = ''; break; case '>': if(*(prog+1)=='=') { prog++; prog++; *temp = GE; temp++; *temp = GE; } else { prog++; *temp = GT; } temp++; *temp = ''; break; } if(*token) return(token_type = DELIMITER); } if(strchr("+-*^/%=;(),'", *prog)){ /* delimiter */ *temp = *prog; prog++; /* advance to next position */ temp++; *temp = ''; return (token_type=DELIMITER); } if(*prog=='"') { /* quoted string */ prog++; while(*prog!='"'&& *prog!='\r') *temp++ = *prog++; if(*prog=='\r') sntx_err(SYNTAX); prog++; *temp = ''; return(token_type=STRING); } if(isdigit(*prog)) { /* number */ while(!isdelim(*prog)) *temp++ = *prog++; *temp = ''; return(token_type = NUMBER); } if(isalpha(*prog)) { /* var or command */ while(!isdelim(*prog)) *temp++ = *prog++; token_type=TEMP; } *temp = ''; /* see if a string is a command or a variable */ if(token_type==TEMP) { tok = look_up(token); /* convert to internal rep */ if(tok) token_type = KEYWORD; /* is a keyword */ else token_type = IDENTIFIER; } return token_type; } /* Return a token to input stream. */ void putback(void) { char *t; t = token; for(; *t; t++) prog--; } /* Look up a token's internal representation in the token table. */ look_up(char *s) { register int i; char *p; /* convert to lowercase */ p = s; while(*p){ *p = tolower(*p); p++; } /* see if token is in table */ for(i=0; *table[i].command; i++) if(!strcmp(table[i].command, s)) return table[i].tok; return 0; /* unknown command */ } /* Return index of internal library function or -1 if not found. */ internal_func(char *s) { int i; for(i=0; intern_func[i].f_name[0]; i++) { if(!strcmp(intern_func[i].f_name, s)) return i; } return -1; } /* Return true if c is a delimiter. */ isdelim(char c) { if(strchr(" !;,+-<>'/*%^=()", c) || c==9 || c=='\r' || c==0) return 1; return 0; } /* Return 1 if c is space or tab. */ iswhite(char c) { if(c==' ' || c=='\t') return 1; else return 0; }
[LISTING TWO]
/* A Little C interpreter. */ #include "stdio.h" #include "setjmp.h" #include "math.h" #include "ctype.h" #include "stdlib.h" #include "string.h" #define NUM_FUNC 100 #define NUM_GLOBAL_VARS 100 #define NUM_LOCAL_VARS 200 #define NUM_BLOCK 100 #define ID_LEN 31 #define FUNC_CALLS 31 #define NUM_PARAMS 31 #define PROG_SIZE 10000 #define LOOP_NEST 31 enum tok_types {DELIMITER, IDENTIFIER, NUMBER, KEYWORD, TEMP, STRING, BLOCK}; /* add additional C keyword tokens here */ enum tokens {ARG, CHAR, INT, IF, ELSE, FOR, DO, WHILE, SWITCH, RETURN, EOL, FINISHED, END}; /* add additional double operators here (such as ->) */ enum double_ops {LT=1, LE, GT, GE, EQ, NE}; /* These are the constants used to call sntx_err() when a syntax error occurs. Add more if you like. NOTE: SYNTAX is a generic error message used when nothing else seems appropriate. */ enum error_msg {SYNTAX, UNBAL_PARENS, NO_EXP, EQUALS_EXPECTED, NOT_VAR, PARAM_ERR, SEMI_EXPECTED, UNBAL_BRACES, FUNC_UNDEF, TYPE_EXPECTED, NEST_FUNC, RET_NOCALL, PAREN_EXPECTED, WHILE_EXPECTED, QUOTE_EXPECTED, NOT_TEMP, TOO_MANY_LVARS}; char *prog; /* current location in source code */ char *p_buf; /* points to start of program buffer */ jmp_buf e_buf; /* hold environment for longjmp() */ /* An array of these structures will hold the info associated with global variables. */ struct var_type { char var_name[ID_LEN]; int var_type; int value; } global_vars[NUM_GLOBAL_VARS]; struct var_type local_var_stack[NUM_LOCAL_VARS]; struct func_type { char func_name[ID_LEN]; char *loc; /* location of entry point in file */ } func_table[NUM_FUNC]; int call_stack[NUM_FUNC]; struct commands { /* keyword lookup table */ char command[20]; char tok; } table[] = { /* Commands must be entered lowercase */ "if", IF, /* in this table. */ "else", ELSE, "for", FOR, "do", DO, "while", WHILE, "char", CHAR, "int", INT, "return", RETURN, "end", END, "", END /* mark end of table */ }; char token[80]; char token_type, tok; int functos; /* index to top of function call stack */ int func_index; /* index into function table */ int gvar_index; /* index into global variable table */ int lvartos; /* index into local variable stack */ int ret_value; /* function return value */ void print(void), prescan(void); void decl_global(void), call(void), putback(void); void decl_local(void), local_push(struct var_type i); void eval_exp(int *value), sntx_err(int error); void exec_if(void), find_eob(void), exec_for(void); void get_params(void), get_args(void); void exec_while(void), func_push(int i), exec_do(void); void assign_var(char *var_name, int value); int load_program(char *p, char *fname), find_var(char *s); void interp_block(void), func_ret(void); int func_pop(void), is_var(char *s), get_token(void); char *find_func(char *name); main(int argc, char *argv[]) { if(argc!=2) { printf("usage: c <filename>\n"); exit(1); } /* allocate memory for the program */ if((p_buf=(char *) malloc(PROG_SIZE))==NULL) { printf("allocation failure"); exit(1); } /* load the program to execute */ if(!load_program(p_buf, argv[1])) exit(1); if(setjmp(e_buf)) exit(1); /* initialize long jump buffer */ /* set program pointer to start of program buffer */ prog = p_buf; prescan(); /* find the location of all functions and global variables in the program */ gvar_index = 0; /* initialize global variable index */ lvartos = 0; /* initialize local variable stack index */ functos = 0; /* initialize the CALL stack index */ /* setup call to main() */ prog = find_func("main"); /* find program starting point */ prog--; /* back up to opening ( */ strcpy(token, "main"); call(); /* call main() to start interpreting */ } /* Interpret a single statement or block of code. When interp_block() returns from it's initial call, the final brace (or a return) in main() has been encountered. */ void interp_block(void) { int value; char block = 0; do { token_type = get_token(); /* If interpreting single statement, return on first semicolon. */ /* see what kind of token is up */ if(token_type==IDENTIFIER) { /* Not a keyword, so process expression. */ putback(); /* restore token to input stream for further processing by eval_exp() */ eval_exp(&value);/* process the expression */ if(*token!=';') sntx_err(SEMI_EXPECTED); } else if(token_type==BLOCK) { /* if block delimiter */ if(*token=='{') /* is a block */ block = 1; /* interpreting block, not statement */ else return; /* is a }, so return */ } else /* is keyword */ switch(tok) { case CHAR: case INT: /* declare local variables */ putback(); decl_local(); break; case RETURN: /* return from function call */ func_ret(); return; case IF: /* process an if statement */ exec_if(); break; case ELSE: /* process an else statement */ find_eob(); /* find end of else block and continue execution */ break; case WHILE: /* process a while loop */ exec_while(); break; case DO: /* process a do-while loop */ exec_do(); break; case FOR: exec_for(); break; case END: exit(0); } } while (tok != FINISHED && block); } /* Load a program. */ load_program(char *p, char *fname) { FILE *fp; int i=0; if((fp=fopen(fname, "rb"))==NULL) return 0; i = 0; do { *p = getc(fp); p++; i++; } while(!feof(fp) && i<PROG_SIZE); *(p-2) = ''; /* null terminate the program */ fclose(fp); return 1; } /* Find the location of all functions in the program and store global variables. */ void prescan(void) { char *p; char temp[32]; int brace = 0; /* When 0, this var tells us that current source position is outside of any function. */ p = prog; func_index = 0; do { while(brace) { /* bypass code inside functions */ get_token(); if(*token=='{') brace++; if(*token=='}') brace--; } get_token(); if(tok==CHAR || tok==INT) { /* is global var */ putback(); decl_global(); } else if(token_type==IDENTIFIER) { strcpy(temp, token); get_token(); if(*token=='(') { /* must be assume a function */ func_table[func_index].loc = prog; strcpy(func_table[func_index].func_name, temp); func_index++; while(*prog!=')') prog++; prog++; /* prog points to opening curly brace of function */ } else putback(); } else if(*token=='{') brace++; } while(tok!=FINISHED); prog = p; } /* Return the entry point of the specified function. Return NULL if not found. */ char *find_func(char *name) { register int i; for(i=0; i<func_index; i++) if(!strcmp(name, func_table[i].func_name)) return func_table[i].loc; return NULL; } /* Declare a global variable. */ void decl_global(void) { get_token(); /* get type */ global_vars[gvar_index].var_type = tok; global_vars[gvar_index].value = 0; /* init to 0 */ do { /* process comma-separated list */ get_token(); /* get name */ strcpy(global_vars[gvar_index].var_name, token); get_token(); gvar_index++; } while(*token==','); if(*token!=';') sntx_err(SEMI_EXPECTED); } /* Declare a local variable. */ void decl_local(void) { struct var_type i; get_token(); /* get type */ i.var_type = tok; i.value = 0; /* init to 0 */ do { /* process comma-separated list */ get_token(); /* get var name */ strcpy(i.var_name, token); local_push(i); get_token(); } while(*token==','); if(*token!=';') sntx_err(SEMI_EXPECTED); } /* Call a function. */ void call(void) { char *loc, *temp; int lvartemp; loc = find_func(token); /* find entry point of function */ if(loc==NULL) sntx_err(FUNC_UNDEF); /* function not defined */ else { lvartemp = lvartos; /* save local var stack index */ get_args(); /* get function arguments */ temp = prog; /* save return location */ func_push(lvartemp); /* save local var stack index */ prog = loc; /* reset prog to start of function */ get_params(); /* load the function's parameters with the values of the arguments */ interp_block(); /* interpret the function */ prog = temp; /* reset the program pointer */ lvartos = func_pop(); /* reset the local var stack */ } } /* Push the arguments to a function onto the local variable stack. */ void get_args(void) { int value, count, temp[NUM_PARAMS]; struct var_type i; count = 0; get_token(); if(*token!='(') sntx_err(PAREN_EXPECTED); /* process a comma-separated list of values */ do { eval_exp(&value); temp[count] = value; /* save temporarily */ get_token(); count++; }while(*token==','); count--; /* now, push on local_var_stack in reverse order */ for(; count>=0; count--) { i.value = temp[count]; i.var_type = ARG; local_push(i); } } /* Get function parameters. */ void get_params(void) { struct var_type *p; int i; i = lvartos-1; do { /* process comma-separated list of parameters */ get_token(); p = &local_var_stack[i]; if(*token!=')') { if(tok!=INT && tok!=CHAR) sntx_err(TYPE_EXPECTED); p->var_type = token_type; get_token(); /* link parameter name with argument already on local var stack */ strcpy(p->var_name, token); get_token(); i--; } else break; } while(*token==','); if(*token!=')') sntx_err(PAREN_EXPECTED); } /* Return from a function. */ void func_ret(void) { int value; value = 0; /* get return value, if any */ eval_exp(&value); ret_value = value; } /* Push local variable */ void local_push(struct var_type i) { if(lvartos>NUM_LOCAL_VARS) sntx_err(TOO_MANY_LVARS); local_var_stack[lvartos] = i; lvartos++; } /* Pop index into local variable stack. */ func_pop(void) { functos--; if(functos<0) sntx_err(RET_NOCALL); return(call_stack[functos]); } /* Push index of local variable stack. */ void func_push(int i) { if(functos>NUM_FUNC) sntx_err(NEST_FUNC); call_stack[functos] = i; functos++; } /* Assign a value to a variable. */ void assign_var(char *var_name, int value) { register int i; /* first, see if it's a local variable */ for(i=lvartos-1; i>=call_stack[functos-1]; i--) { if(!strcmp(local_var_stack[i].var_name, var_name)) { local_var_stack[i].value = value; return; } } if(i < call_stack[functos-1]) /* if not local, try global var table */ for(i=0; i<NUM_GLOBAL_VARS; i++) if(!strcmp(global_vars[i].var_name, var_name)) { global_vars[i].value = value; return; } sntx_err(NOT_VAR); /* variable not found */ } /* Find the value of a variable. */ int find_var(char *s) { register int i; /* first, see if it's a local variable */ for(i=lvartos-1; i>=call_stack[functos-1]; i--) if(!strcmp(local_var_stack[i].var_name, token)) return local_var_stack[i].value; /* otherwise, try global vars */ for(i=0; i<NUM_GLOBAL_VARS; i++) if(!strcmp(global_vars[i].var_name, s)) return global_vars[i].value; sntx_err(NOT_VAR); /* variable not found */ } /* Determine if an identifier is a variable. Return 1 if variable is found; 0 otherwise. */ int is_var(char *s) { register int i; /* first, see if it's a local variable */ for(i=lvartos-1; i>=call_stack[functos-1]; i--) if(!strcmp(local_var_stack[i].var_name, token)) return 1; /* otherwise, try global vars */ for(i=0; i<NUM_GLOBAL_VARS; i++) if(!strcmp(global_vars[i].var_name, s)) return 1; return 0; } /* Execute an IF statement. */ void exec_if(void) { int cond; eval_exp(&cond); /* get left expression */ if(cond) { /* is true so process target of IF */ interp_block(); } else { /* otherwise skip around IF block and process the ELSE, if present */ find_eob(); /* find start of next line */ get_token(); if(tok!=ELSE) { putback(); /* restore token if no ELSE is present */ return; } interp_block(); } } /* Execute a while loop. */ void exec_while(void) { int cond; char *temp; putback(); temp = prog; /* save location of top of while loop */ get_token(); eval_exp(&cond); /* check the conditional expression */ if(cond) interp_block(); /* if true, interpret */ else { /* otherwise, skip around loop */ find_eob(); return; } prog = temp; /* loop back to top */ } /*Execute a do loop. */ void exec_do(void) { int cond; char *temp; putback(); temp = prog; /* save location of top of do loop */ get_token(); /* get start of loop */ interp_block(); /* interpret loop */ get_token(); if(tok!=WHILE) sntx_err(WHILE_EXPECTED); eval_exp(&cond); /* check the loop condition */ if(cond) prog = temp; /* if true loop; otherwise, continue on */ } /* Find the end of a block. */ void find_eob(void) { int brace; get_token(); brace = 1; do { get_token(); if(*token=='{') brace++; else if(*token=='}') brace--; } while(brace); } /* Execute a while loop. */ void exec_for(void) { int cond; char *temp, *temp2; int brace ; get_token(); eval_exp(&cond); /*initialization expression */ if(*token!=';') sntx_err(SEMI_EXPECTED); prog++; /* get past the ; */ temp = prog; for(;;) { eval_exp(&cond); /* check the condition */ if(*token!=';') sntx_err(SEMI_EXPECTED); prog++; /* get past the ; */ temp2 = prog; /* find the start of the for block */ brace = 1; while(brace) { get_token(); if(*token=='(') brace++; if(*token==')') brace--; } if(cond) interp_block(); /* if true, interpret */ else { /* otherwise, skip around loop */ find_eob(); return; } prog = temp2; eval_exp(&cond); /* do the increment */ prog = temp; /* loop back to top */ } }
[LISTING THREE]
/****** Internal Library Functions *******/ /* Add more of your own, here. */ #include "conio.h" /* if your compiler does not support this header file, remove it */ #include "stdio.h" #include "stdlib.h" extern char *prog; /* points to current location in program */ extern char token[80]; /* holds string representation of token */ extern char token_type; /* contains type of token */ extern char tok; /* holds the internal representation of token */ enum tok_types {DELIMITER, IDENTIFIER, NUMBER, COMMAND, STRING, QUOTE, VARIABLE, BLOCK, FUNCTION}; /* These are the constants used to call sntx_err() when a syntax error occurs. Add more if you like. NOTE: SYNTAX is a generic error message used when nothing else seems appropriate. */ enum error_msg {SYNTAX, UNBAL_PARENS, NO_EXP, EQUALS_EXPECTED, NOT_VAR, PARAM_ERR, SEMI_EXPECTED, UNBAL_BRACES, FUNC_UNDEF, TYPE_EXPECTED, NEST_FUNC, RET_NOCALL, PAREN_EXPECTED, WHILE_EXPECTED, QUOTE_EXPECTED, NOT_STRING, TOO_MANY_LVARS}; int get_token(void); void sntx_err(int error), eval_exp(int *result); void putback(void); /* Get a character from console. (Use getchar()) if your compiler does not support getche().) */ call_getche() { char ch; ch = getche(); while(*prog!=')') prog++; prog++; /* advance to end of line */ return ch; } /* Put a character to the display. (Use putchar() if your compiler does not support putch().) */ call_putch() { int value; eval_exp(&value); printf("%c", value); return value; } /* Call puts(). */ call_puts(void) { get_token(); if(*token!='(') sntx_err(PAREN_EXPECTED); get_token(); if(token_type!=QUOTE) sntx_err(QUOTE_EXPECTED); puts(token); get_token(); if(*token!=')') sntx_err(PAREN_EXPECTED); get_token(); if(*token!=';') sntx_err(SEMI_EXPECTED); putback(); return 0; } /* A built-in console output function. */ int print(void) { int i; get_token(); if(*token!='(') sntx_err(PAREN_EXPECTED); get_token(); if(token_type==QUOTE) { /* output a string */ printf("%s ", token); } else { /* output a number */ putback(); eval_exp(&i); printf("%d ", i); } get_token(); if(*token!=')') sntx_err(PAREN_EXPECTED); get_token(); if(*token!=';') sntx_err(SEMI_EXPECTED); putback(); return 0; } /* Read an integer from the keyboard. */ getnum(void) { char s[80]; gets(s); while(*prog!=')') prog++; prog++; /* advance to end of line */ return atoi(s); }
[LISTING FOUR]
/* C Interpreter Demonstration Program This program demonstrates all features of C that are recognized by this C interpreter. */ int i, j; /* global vars */ char ch; main() { int i, j; /* local vars */ puts("C Demo Program."); print_alpha(); do { puts("enter a number (0 to quit): "); i = getnum(); if(i < 0 ) { puts("numbers must be positive, try again"); } else { for(j = 0; j < i; j=j+1) { print(j); print("summed is"); print(sum(j)); puts(""); } } } while(i!=0); } /* Sum the values between 0 and num. */ sum(int num) { int running_sum; running_sum = 0; while(num) { running_sum = running_sum + num; num = num - 1; } return running_sum; } /* Print the alphabet. */ print_alpha() { for(ch = 'A'; ch<='Z'; ch = ch + 1) { putch(ch); } puts(""); } /* Nested loop example. */ main() { int i, j, k; for(i = 0; i < 5; i = i + 1) { for(j = 0; j < 3; j = j + 1) { for(k = 3; k ; k = k - 1) { print(i); print(j); print(k); puts(""); } } } puts("done"); } /* Assigments as operations. */ main() { int a, b; a = b = 10; print(a); print(b); while(a=a-1) { print(a); do { print(b); }while((b=b-1) > -10); } } /* This program demonstrates recursive functions. */ main() { print(factr(7) * 2); } /* return the factorial of i */ factr(int i) { if(i<2) { return 1; } else { return i * factr(i-1); } } /* A more rigorous example of function arguments. */ main() { f2(10, f1(10, 20), 99); } f1(int a, int b) { int count; print("in f1"); count = a; do { print(count); } while(count=count-1); print(a); print(b); print(a*b); return a*b; } f2(int a, int x, int y) { print(a); print(x); print(x / a); print(y*x); } /* The loop statements. */ main() { int a; char ch; /* the while */ puts("Enter a number: "); a = getnum(); while(a) { print(a); print(a*a); puts(""); a = a - 1; } /* the do-while */ puts("enter characters, 'q' to quit"); do { ch = getche(); } while(ch!='q'); /* the for */ for(a=0; a<10; a = a + 1) { print(a); } }