This was created for an online workshop on parsing which was held on and was recorded.
The source code can be found here. The code for showing the syntax tree is based on Syntactic Tree Viewer by Christos Christodoulopoulos, which makes use of d3.js and itself is inspired by D3.js Drag and Drop Zoomable Tree by Rob Schmuecker. A simular approacht, with a more advanced editor (with syntax-colour) is AGL Editor Demo.
root : int ident string .This grammar consists of one production rule. The production rule starts with the root non-terminal, which is the non-terminal that is used to parse the whole input. After the colon the elements of the production rule are given. The element can be terminals and non-terminals. In this case it consists of three terminals. At the end of the production rule a period is placed. There are four predefined terminals defined:
123 abc "test ?*"The abstract syntax tree of the parsing is printed as:
The following grammar parses the same input as the above, but now it uses two extra non-terminal and two additional production rule for those non-terminals:
root : first rest . first : int . rest : ident string .
But this does affect the abstract syntax tree, which now will look like:
To parse input that consists of the keyword if followed by an identifier between round brackets, the following grammar can be used:
root : "if" "(" ident ")" .In this grammar rule, literal strings, between double quotes are used as additional terminals in the grammar. Literal strings that start with an alphabetic character are treated as keywords, meaning that it should be terminated with a non-identifier character. It also means that it is excluded as a valid identifier. An example of input that should parse correctly is:
if ( a )The following input is not parsed correctly, because although it starts with if it does not start with an identifier that is equal to if.
iff ( a )The following input does not parse correctly, because if is no longer a valid identifier:
if ( if )
To parse input that consists an optional integer, one or more identifiers, and zero or more strings, the following grammar can be used:
root : int OPT ident SEQ string SEQ OPT .The OPT placed after an element indicates that the elements is optional. The SEQ placed after an element indicates that the element can occur one or more times. The combination of SEQ and OPT indicates that the element can occur zero or more times.
To parse input that consists of a sum of integers, the following grammar can be used:
root : int CHAIN "+"The CHAIN followed by a literal indicates a chain sequence, where the literal is used as a separator between the elements. An example input is:
1 + 6 + 5The LIST is a shorthand for a chain sequence with a comma.
To parse input that consists of intermixed list of integers and identifiers (separated by commas), the following grammar can be used:
root : ( int | ident ) LIST .This will parse an input like:
ad, 4, 5, gh, jkThe grammar rule above also shows the use of brackets to group elements and the vertical bar character to separate alternatives. The vertical bar can also be used to combine several production rules for one non-terminal together.
When the input parses correctly, an abstract syntax tree is displayed in the text area to the right of the input in textual form and as a tree in the area below it. To annotate this abstract syntax tree, it is possible to annotate the production rules with identifiers between square brackets. An example of this is given in the grammar below, a grammar for simple expressions to add, multiply, and divide numbers:
root : E . E : F CHAIN "+" [sum]. F : T | F "*" T [times] | F "/" T [div]. T : int | "(" E ")".
Whitespace is accepted between all the terminals. As whitespace the space, the tab and the newline characters are accepted and also the two types of comments that are allowed in the C-language: Any text between /* and */ and the remainder of the line followed by a // sequence.
For educational purposes, the rules with respect to the specification of terminals are limited. For more options, see the C++ implementation of IParse and the C implementation of RawParser.