Online interactive parser

This page gives an interactive parser, which allows you to enter a grammar and use it to parse some input. It can be a helpful tool in developping a grammar and testing it agains example input. The parser is a reimplementation of (a subset of) IParse, an interpretting parser, in JavaScript and responds to keystrokes, which gives immediate response to the grammar and input being entered. The abstract syntax tree is displayed as a tree. For demonstration purposes, an evaluator for a limited set of language constructs has been added, such that expression represented by the abstract syntax tree can be evaluated and some basic control flow statement can be executed.

This was created for an online workshop on parsing which was held on and was recorded.

The source code can be found here. The code for showing the syntax tree is based on Syntactic Tree Viewer by Christos Christodoulopoulos, which makes use of d3.js and itself is inspired by D3.js Drag and Drop Zoomable Tree by Rob Schmuecker. A simular approacht, with a more advanced editor (with syntax-colour) is AGL Editor Demo.

Grammar description

To parse input that consists of an integer, an identifier and a string, the following grammar can be used:

root : int ident string .

This grammar consists of one production rule. The production rule starts with the root non-terminal, which is the non-terminal that is used to parse the whole input. After the colon the elements of the production rule are given. The element can be terminals and non-terminals. In this case it consists of three terminals. At the end of the production rule a period is placed. There are four predefined terminals defined:

int: parses a whole number
ident: parses a identifier, starting with an alphabetic character or an underscore followed by zero or more alphabetic characts, underscores or numerical characters.
string: parses a string surrounded with double quotes. The back-slash character acts as an escape character.
char: parses a character surrounded with single quotes. The back-slash character acts as an escape character.

An example input that should parse correctly is:

123 abc "test ?*"

The abstract syntax tree of the parsing is printed as:

(int:123,ident:abc,string:"test ?*")

The following grammar parses the same input as the above, but now it uses two extra non-terminal and two additional production rule for those non-terminals:

root : first rest .
first : int .
rest : ident string .

But this does affect the abstract syntax tree, which now will look like:

(int:123,(ident:abc,string:"test ?*"))

To parse input that consists of the keyword if followed by an identifier between round brackets, the following grammar can be used:

root : "if" "(" ident ")" .

In this grammar rule, literal strings, between double quotes are used as additional terminals in the grammar. Literal strings that start with an alphabetic character are treated as keywords, meaning that it should be terminated with a non-identifier character. It also means that it is excluded as a valid identifier. An example of input that should parse correctly is:

if ( a )

The following input is not parsed correctly, because although it starts with if it does not start with an identifier that is equal to if.

iff ( a )

The following input does not parse correctly, because if is no longer a valid identifier:

if ( if )

To parse input that consists an optional integer, one or more identifiers, and zero or more strings, the following grammar can be used:

root : int OPT 
       ident SEQ
       string SEQ OPT .

The OPT placed after an element indicates that the elements is optional. The SEQ placed after an element indicates that the element can occur one or more times. The combination of SEQ and OPT indicates that the element can occur zero or more times.

To parse input that consists of a sum of integers, the following grammar can be used:

root : int CHAIN "+"

The CHAIN followed by a literal indicates a chain sequence, where the literal is used as a separator between the elements. An example input is:

1 + 6 + 5

The LIST is a shorthand for a chain sequence with a comma.

To parse input that consists of intermixed list of integers and identifiers (separated by commas), the following grammar can be used:

root : ( int | ident ) LIST .

This will parse an input like:

ad, 4, 5, gh, jk

The grammar rule above also shows the use of brackets to group elements and the vertical bar character to separate alternatives. The vertical bar can also be used to combine several production rules for one non-terminal together.

When the input parses correctly, an abstract syntax tree is displayed in the text area to the right of the input in textual form and as a tree in the area below it. To annotate this abstract syntax tree, it is possible to annotate the production rules with identifiers between square brackets. An example of this is given in the grammar below, a grammar for simple expressions to add, multiply, and divide numbers:

root : E .
E : F CHAIN "+" [sum].
F : T | F "*" T [times] | F "/" T [div].
T : int | "(" E ")".

Whitespace is accepted between all the terminals. As whitespace the space, the tab and the newline characters are accepted and also the two types of comments that are allowed in the C-language: Any text between /* and */ and the remainder of the line followed by a // sequence.

For educational purposes, the rules with respect to the specification of terminals are limited. For more options, see the C++ implementation of IParse and the C implementation of RawParser.

Interactive parser

Below you can enter a grammar and some input to be parsed according to the grammar. On the left are the input text areas. At every keystroke (or clicking the execute button) the grammar will be parsed and if it is without errors, the input below is parsed according to the given grammar. Errors and results are shown in the text areas on the right. If the abstract syntax tree gets large, the button 'show tree' needs to be clicked to show it. The button 'evaluate' will acitvate the evaluator described below.

Enter the grammar:

The input to be parsed by the grammar:

Abstract syntax tree evaluator

When the 'evaluate' button is clicked, the abstract syntax tree is evaluated according to the annotations of the nodes and the basic values. Below the supported annotations are mentioned and their function is explained:

add: Add the values of the two child nodes.
sub: Substract the values of the two child nodes.
times: Multiplies the values of the two child nodes.
div: Divides the values of the two child nodes.
min: Minus the value of the one child node.
mod: Modulo of the value of the two child nodes.
sum: Sums the values of the child nodes.
multiply: Multiplies the values of the child nodes.
lt: True if the value of the left node is smaller than the value of the right value.
le: True if the value of the left node is smaller or equal than the value of the right value.
gt: True if the value of the left node is greater than the value of the right value.
ge: True if the value of the left node is greater or equal than the value of the right value.
eq: True if the value of the left node is equal to the value of the right value.
ne: True if the value of the left node is not equal to the value of the right value.
and: True if the values of the two child nodes are true.
or: True if at least on of the values of the two child nodes is true.
not: True if the value of the one child node is not true.
ass: Assigns the value of the second node to the variable named by the first node.
assto: Assigns the value of the first node to the variable named by the second node.
ifthenelse: If the value of the first node is true, the second node is evaluated, otherwise the third
whenotherwise: If the value of the second node is true, the first node is evaluated, otherwise the third.
while: While the value of the first nodes evaluates to true, the second node is evaluated.
print: Prints the value of the only child node on a new line.
prompt: Prompts the user for a value displaying the text value of the only child node.
LIST: Evaluates the child nodes and returns the value of the last node.
defaultsto: Returns the value of the first node, unless it is undefined, then the value of the second node is returned.

The result of the evaluation (and the result of print statements) is displayed in the output text area.

An example grammar using the above annotations is:

root : statement SEQ OPT .
statement
	: ident "=" expr ";" [ass]
	| "if" expr "then" statement SEQ OPT "else" statement SEQ OPT "fi" [ifthenelse]
	| "while" expr "do" statement SEQ OPT "od" [while]
	| "print" expr ";" [print]
	| expr ";"
	.
	
primary_expr
        : ident
        | int
        | char
        | string
        | "(" expr ")"
        .

unary_expr
        : "!" primary_expr [not]
        | "-" primary_expr [min]
        | primary_expr
        .

l_expr1 : l_expr1 "*" unary_expr  [times]
        | l_expr1 "/" unary_expr  [div]
        | l_expr1 "%" unary_expr  [mod]
        | unary_expr
        .
l_expr2 : l_expr2 "+" l_expr1  [add]
        | l_expr2 "-" l_expr1  [sub]
        | l_expr1 
        .
l_expr3 : l_expr3 "<=" l_expr2  [le]
        | l_expr3 ">=" l_expr2  [ge]
        | l_expr3 "<"  l_expr2  [lt]
        | l_expr3 ">"  l_expr2  [gt]
        | l_expr3 "==" l_expr2  [eq]
        | l_expr3 "!=" l_expr2  [ne]
        | l_expr2
        .
l_expr4 : l_expr4 "&&" l_expr3 [land]  | l_expr3 .
l_expr5 : l_expr5 "||" l_expr4 [lor] | l_expr4 .

expr
        : l_expr5 "?" l_expr5 ":" expr  [ifthenelse]
        | l_expr5
        .

An example program in this language, for calculating the greatest common divisor of 345 and 555, is:

a = 345;
b = 555;
while b != 0
do
   c = a % b;
   a = b;
   b = c;
od
print a;

Author: Frans
Personal website
email address