Stack-C
Stack-C is a stack based language designed with the following
properties:
- Could be used as a programming language as itself.
- Serves as an intermediate language for a C-compiler.
- Could be easily compared with the original C program.
- Can be easily compiled to M1 assembly in a single pass.
- Has not type checking.
- All values are considered to be 32-bit integers. No support for
floating numbers
- Used C keywords to avoid introducing 'forbidden' identifiers
in the C programs that are compiled by the C-compiler.
- Identifiers follow C-syntax rules.
There is not a clear design philosophy that was followed. It was developed
along side the C-compiler and was influenced by
some pragmatic decisions. Initial it was thought of as a
(rather Spartan) low-level programming language and it
contains some constructs that are not generated by the
C-compiler. So, it is rather arbitrary why it does have
support for short-circuit logical 'or' and 'and', while these
can be simply emulated with an if-statement on one hand,
while on the other hand it has no support for for and
switch-statements.
Keywords
The following C keywords are used:
- const: to define a constant value.
- void: to define a function.
- int: to define a global or a local variable.
- static: to define a static variable in a function.
- do: to define a loop.
- break: to break out from a loop.
- continue: to continue a loop from the start.
- if: for a if-statement.
- else: for the else part of an if-statement.
- return: to return from a function.
- goto: to goto a specified label.
- char: to sign extend a byte value to a 32 bit value.
Comments
Lines starting with the # character are considered comments
and copied verbatim to the output.
Values
An integer value (following C syntax) will push that value on
the top of the stack.
A single quoted character will push the value of the character
on the stack. The following escape sequences are recognized:
\0, \n, \r, and \t.
A double quoted string will push the address to string constant
on the stack. Repeated string contant will point to the same
address. The same escape sequences as for a character are
recognized.
An identifier (not used in another context) will add the value
associated with the identifier on the top of the stack. For
variables the value is the address location and for functions
it is the start address of the function.
Stack operators
The following operators work like the C-operators taking the top
two values from the stack and push the result on the stack (thus
making the stack one value less high). They presume that the
values represent unsigned integers.
+ - * / % & | ^ << >> == != < <= > >=
The following operator are similar to the above, except from
that they presume that the value represent signed integers.
/s <s <=s >s >=s
The following operator are similar to the C operator but operate
on the top value (replacing it).
~ !
The following special operators are defined:
- $: Duplicate the top value on the stack.
- ;: Pop the top value of the stack.
- ><: Swap the top two value of the stack.
- ?: Replace the top value of the stack by the value from the
memory four location pointed to by the value on the stack.
- ?2: Similar for two memory locations.
- ?1: Similar for one memory location.
- !: Store the top value in four memory locations pointed to
the second top value on the stack. It removes the second top
value from the stack.
- !2: Similar for two memory locations.
- !1: Similar for one memory location.
- =:: Store the second value on the stack in four locations
pointed to the top value on the stack. Removes the top two
values from the stack. (Similar to >< = ;)
- -p: Equvalent with -. (Is needed for the Stack-C
interpreter to work correctly.)
- -> followed by a constant identifier: This will retrieve
the value in the four memory locations pointed to by the top
value on the stack and replace the top value with this value
increased with the value of the constant.
- (): Calls the function pointed to by the top value on the
stack. The top value is popped from the stack.
- char: Replaces the top of the stack with the sign extended
value of the least significant byte.
Statements
Constant definition
A constant definition starts with the keyword const followed
by an identifier and an integer constant.
Function definition
A function definition starts with the keyword void followed by
an identifier. If it is followed by either a ; character, to
indicate a forward declaration of a function, or by a code block.
The curly brackets { and } are used for block definitions.
Global or local variable definitions
A global or local variable definition starts with the keyword
int followed by an optional positive integer and an identifier.
The optional integer specifies the size in multiples of four
bytes. When a variable definition occurs within a block it is
considered to be local to that block.
Static variable definitions
These are like local variable definitions except that they start
with the keyword static.
If statement
The if statement starts with the if keyword and uses the top
value on the stack to determine of the following block will be
executed, or, in case that block is followed with the else
keyword the block following that will be executed.
Do statement
The do statement starts with the do keyword followed by a block.
Within a do statment, the keywords break and continue may be
used to indicate a jump out of the loop amd start of the loop
respectively.
Logic and and or statements
The logic and and or statements start with respectively && and
|| followed by a code block (in order to implement the
short-circuit functionality). Whether the block will be executed
depens on the top value of the stack, which will be popped in case
the following block is executed. For this reason && { is
equivalent with $ if { ; and || { with $ ! if { ;.
Return statements
The return statements consists of the return keyword. The stack is
not affected.
Goto statement
The goto statement starts with the goto keyword followed by an
identifier representing a label. Labels are defined by : followed
by an identifier. The label definition may occur before or after
the goto statement.
Stack-C compiler
The Stack-C compiler is implemented in
stack_c.c. It produces output for the M1 assembler. The contents of the
file stack_c_intro.M1 is copied to the start of the output, which
does introduce the labels ELF_text,
_start, f_sys_int80, f_sys_malloc, and SYS_MALLOC. The
compiler also does generate some new labels, such as ELF_end
and as described below. For all the global variables, the
compiler produces labels prefixed with l_ and for all the
functions labels prefixed with f_.
For labels used inside functions the compiler produces labels
of the form l_%s_%s, where the first %s is replaced with
the function name and the second with the label name. (This
could lead to a problem, if, for example, there is a function with
the name func_a_x with a label b and a function with the name
func_a with a label x_b.)
For implementing the various language constructs, it will introduce
labels of the following forms, where %s is replaced by the
function name and %d by a unique integer value (for the function):
- _%s_else%d
- _%s_else_end%d
- _%s_loop%d
- _%s_loop_end%d
- _%s_and_end%d
- _%s_or_end%d
For each static variable, it will introduce a label of the form
static_%d_%s, where where %d is replaced by a unique number
and %s by the name of the static variable. (Note that it is
possible to have several static variables with the same name in
a single function.)
For each constant string, it will introduce a label of the form
string_%d, where the %d is replaced by a unique number.
x86 implementation
The implementation makes use of two stack. One stack contains
the temporary values used for the evaluation of expressions,
including arguments passed to functions and results returned
by functions. For this the normal stack is used. The other
stack is used for local variables and also the return address
for functions being called. This is stored in the ebp
register. All local variables are given a positive index with
respect to the second stack pointer. The stack pointer is
moved on function call and exit. Before main is called,
100.000 bytes are allocated for it.
Stack-C interpreter
The Stack-C interpreter is implemented in
stack_c_interpreter.c. This interpreter
was primarily developed for debugging purposes, not for being
fast. It keeps track of what kind of data is stored in a memory
location, whether it is a value or whether it is a pointer to
memory location, to a function, or to a constant string. It also
performs range checking by representing each pointer as pair of
a pointer to the start of a memory range (or string constant)
and an index. It also generates errors when 'illegal' operations
are performed and warnings when trick operations are performed,
such as, for example, comparing pointers from to different memory
locations, because the order could be defined by the implementation
of the memory allocation function. When an error is reported the
contents of the stack and the call stack is printed. Comment lines
that consist of a file name followed by a number are interpreted
as meaning references to source files. Information from these is
used in the call stack.
The interpreter assumes that all pointers are stored at 'word'
(four byte) offsets.
The interpreter only has support for a limited number of system
calls, namely:
- 1: exit
- 3: read
- 4: write
- 5: open
- 6: close
- 10: unlink
- 19: lseek
- 183: getcwd
There have been plans to incorporate a debugger to allow inspection
of values on the stack and the values of variables.
Home