[About] [Download] [Contact]
[About]

Here you can find the text of a working LL(1) grammar that is able to parse the Sinclair BASIC used in the original ZX Spectrum 16K/48K. A LL(1) grammar is not the best in terms of structural efficiency of the generated ASTs (LALR grammars are better), but it excels in terms of simplicity of parser execution.

This is another item in the software related to the ZX Spectrum that I have developed over the years.

The grammar covers the language for the mentioned models except the part concerning external devices (Microdrive, etc.), since they can provide varying syntax for the corresponding statements, nor the additional statements available in more modern models (e.g., AY sound in 128K). Neither should be difficult to add, anyway.

This grammar has been tested successfully with many programs of the 80s but also with a number of programs from the ZX Spectrum BASIC Jam written in 2017. In some of the latter, it has served to detect deviations from the Sinclair BASIC, such as the use of special commands used by BASIC compilers.

It has also served as the core for the ZX-Basicus utility to synthetize and analyze ZX Spectrum BASIC programs.

The format of the file is very simple: plain ASCII text with capitalized terminals and lower-case non-terminals. It does not follow the syntax of Bison or any other parser generator, but it should be straightforward to adapt.

// ZX SPECTRUM BASIC - ORIGINAL 48 SINCLAIR BASIC (FOR ZX 16K/48K ZX SPECTRUM)

// LL(1) Grammar v2.0.0 by (c) Juan-Antonio Fernandez-Madrigal, 2019
// http://jafma.net

// NOTES:

//	-Capitalized keywords: terminals ; non-capitalized keywords: non-terminals

//	-Like in the original BASIC, this grammar accepts empty statements (e.g., 
//	'10 ::'. Unlike in the original BASIC after tokenized, this grammar accepts 
//	empty lines (only the line number).

//	-External devices are needed for the ZX ROM to recognize these commands,
//	and if the devices are connected, they can provide their own syntax,
//	thus they are not included in the grammar: CAT, FORMAT, MOVE, ERASE,
//	OPEN, CLOSE.

//	-Other models of the ZX Spectrum have more commands, that are not included
//	in this grammar.

//  -We use semantic versioning for this grammar (https://semver.org/).

Terminals:
{
		// cat 0: miscellaneous terminals 
	{	LINENUMBER ENDOFLINE ENDOFSTAT BEGINPAR ENDPAR 
		COMMA SEMICOLON STRINGLIT NUMLIT VARNAME 
		ENDOFPROG } 
	
	// The next categories must go in this very order:

		// cat 1: statements (same order as in Lex.h :: CommandID)
	{ 	DEFFN MERGE VERIFY BEEP CIRCLE OUT LPRINT LLIST STOP READ RESTORE 
		NEW BORDER CONTINUE	DIM REM FOR GOTO GOSUB INPUT LOAD LIST 
		LET PAUSE NEXT POKE PRINT PLOT RUN SAVE RANDOMIZE IF CLS DRAW 
		CLEAR RETURN COPY } 
		// cat 2: substatements (SubcommandID)
	{	LINE THEN STEP AT TAB CHANNEL 
		APOSTROPHE } 
		// cat 3: functions (FunctionID)
	{	RND INKEY PI FN POINT ATTR VAL_STR VAL LEN SIN COS TAN ASN ACS 
		ATN LN EXP INT SQR SGN ABS PEEK IN USR STR CHR 
		NOT BIN } 
		// cat 4: operators (OperatorID)
	{	MULT DIV LESS GREAT EXPON OR AND LESSEQ GREATEQ 
		NOTEQ } 
		// cat 5: multiple (MultipleID)
	{	EQUAL PLUS MINUS TO SCREEN DATA CODE
		INK PAPER FLASH BRIGHT INVERSE OVER } 
}

Non-terminals:
{
		// cat 0: program
	{	program listoflines line 
		statements statementwithnormsep statementwiththensep
		restofstatementsnorm restofstatementsthen
		statement0 statement01 statement1 statement2 
		statementcons statementgraph statementcass 
		statementdeclv statementflownext statementflowif 
		statementdefs } 
		// cat 1: statements
	{	definitionfn filespecextra filespeccode restoffilespeccode
		optionalstep } 
		// cat 2: I/O
	{	plotseq printseq inputseq optionaldraw optionalinputitem
		printitem printsep restofprintseq 
		printcontrol attrcontrol loccontrol channelcontrol 
		inputitem restofinputseq inputseqorempty
		plotitems plotitem restofplotitems } 
		// cat 3: variables
	{	funcparms restofparms 
		listofreadvars restoflistofreadvars 
		topvar
		vardims dims restofdims 
		var	optionalseqofindexes firstparindexes tailofindexes 
		optionalmoreseqofindexes moreparindexes index
		indexes indexorslice indexorslice1 restofindexofslice1 
		indexorslice2 restofrestofindexofslice } 
		// cat 4: expressions
	{ 	toplevelexpr topleveloptexpr toplevelexprnonvar
		expression restofexpr listofexprs restoflistofexprs
		fnoptionallistofexprs fnlistofexprs fnrestoflistofexprs
		expressionnonvar 
		highprecexpr highprecexprnonvar usrfunctioncall sysfunctioncall
		op }
}

Rules:
{

// ---------------- HIGH LEVEL: program, lines and statements

program : listoflines ENDOFPROG  ;  

listoflines :  line ENDOFLINE listoflines | /* empty */ ; 

line : LINENUMBER statements  ; 

statements : statementwithnormsep restofstatementsnorm
			| statementwiththensep restofstatementsthen
			| restofstatementsnorm /* to allow for ": :" */ ;

statementwithnormsep :	statement0 | statement01 | statement1 | statement2 
			|	statementcons | statementgraph | statementcass
			|	statementflownext | statementdeclv | statementdefs ;

statementwiththensep : statementflowif ;

restofstatementsnorm : ENDOFSTAT statements  | /* empty */ ; 

restofstatementsthen : THEN statements ; 
/* In BASIC, THEN can be followed by stat or by ':' or by nothing. */

statement0 : COPY
		| STOP 
		| NEW 
		| CONTINUE 
		| CLS 
		| RETURN 
		| REM  /* This terminal embeds the comment inside */ ;

statement01 : LLIST topleveloptexpr
		| LIST topleveloptexpr 
		| RUN topleveloptexpr 
		| RANDOMIZE topleveloptexpr 
		| RESTORE topleveloptexpr 
		| CLEAR topleveloptexpr ;

statement1 : MERGE toplevelexpr 
		| INK toplevelexpr 
		| PAPER toplevelexpr 
		| FLASH toplevelexpr 
		| BRIGHT toplevelexpr 
		| INVERSE toplevelexpr 
		| OVER toplevelexpr 
		| BORDER toplevelexpr 
		| GOTO toplevelexpr 
		| GOSUB toplevelexpr 
		| PAUSE toplevelexpr ;

statement2 : BEEP toplevelexpr COMMA toplevelexpr
		| OUT toplevelexpr COMMA toplevelexpr 
		| POKE toplevelexpr COMMA toplevelexpr ;

statementcons : LPRINT printseq 
		| INPUT inputseq 
		| PRINT printseq ;

statementgraph : PLOT plotseq toplevelexpr COMMA toplevelexpr 
		| CIRCLE plotseq toplevelexpr COMMA toplevelexpr COMMA toplevelexpr 
		| DRAW plotseq toplevelexpr COMMA toplevelexpr optionaldraw ;

statementcass : VERIFY toplevelexpr filespecextra 
		| LOAD toplevelexpr filespecextra  
		| SAVE toplevelexpr filespecextra ;

statementdeclv : DIM VARNAME vardims
		| LET topvar EQUAL toplevelexpr
		| READ listofreadvars 
		| FOR VARNAME EQUAL toplevelexpr TO toplevelexpr optionalstep ;
/* Notice that "statementcons" also can include declarations of vars, e.g., 
in INPUT */

statementflownext : NEXT VARNAME ;

statementflowif : IF toplevelexpr ; 

statementdefs : DEFFN definitionfn 
			| DATA listofexprs ;


// ---------------- STATEMENT: DEF FN

definitionfn : VARNAME BEGINPAR funcparms ENDPAR EQUAL toplevelexpr ;
/* funcparms may repeat the same VARNAME and it will be considered different 
(no recursive calls will be made in execution) */

funcparms :	VARNAME restofparms  | /* empty */ ; 

restofparms : COMMA funcparms  | /* empty */ ;  


// ---------------- STATEMENT: FILE MANAGEMENT STATEMENTS

filespecextra : DATA VARNAME BEGINPAR ENDPAR 
			|   CODE filespeccode 
			|	SCREEN	 
			|	LINE toplevelexpr
			| /* empty */ ; 

filespeccode : toplevelexpr restoffilespeccode | /* empty */ ; 

restoffilespeccode : COMMA toplevelexpr | /* empty */ ; 


// ---------------- STATEMENT: PRINT

printseq : printitem restofprintseq 
		|  printsep printseq
		|  /* empty */ ;

printitem : toplevelexpr | printcontrol ;

restofprintseq : printsep printseq | /* empty */ ;

printsep :	COMMA | APOSTROPHE | SEMICOLON  ; 

printcontrol :  attrcontrol 
			|	loccontrol 
			| 	channelcontrol  ; 

attrcontrol :	PAPER toplevelexpr 
			|	INK toplevelexpr 
			|	BRIGHT toplevelexpr 
			|	FLASH toplevelexpr 
			|	INVERSE toplevelexpr 
			|	OVER toplevelexpr  ; 

loccontrol :	AT toplevelexpr COMMA toplevelexpr 
			|	TAB toplevelexpr ; 

channelcontrol :	CHANNEL toplevelexpr  ; 


// ---------------- STATEMENT: INPUT

inputseq : inputitem restofinputseq ;

inputitem : toplevelexprnonvar
			| topvar
			| LINE topvar
			| printcontrol ;

optionalinputitem : inputitem
			| /* empty */ ;

inputseqorempty : optionalinputitem restofinputseq ; 

restofinputseq : printsep inputseqorempty | /* empty */ ;


// ---------------- STATEMENT: PLOT & DRAW

plotseq :	plotitems  | /* empty */ ; 

plotitems :	plotitem printsep restofplotitems  ; 

restofplotitems :	plotitems  | /* empty */ ; 

plotitem : attrcontrol ; 

optionaldraw :	COMMA toplevelexpr  | /* empty */ ; 			


// ---------------- STATEMENT: READ

listofreadvars :	topvar restoflistofreadvars ; 

restoflistofreadvars :	COMMA listofreadvars  | /* empty */ ; 


// ---------------- STATEMENT: DIM

vardims : BEGINPAR dims ENDPAR ; /* "dims" cannot be empty */

dims :	toplevelexpr restofdims  ;  

restofdims :	COMMA dims  | /* empty */ ; 


// ---------------- STATEMENT: FOR

optionalstep :	STEP toplevelexpr  | /* empty */ ; 


// ---------------- VARS IN EXPRESSIONS

topvar : var ;  /* To distinguish between vars embedded in an expression and 
				   isolated vars as in LET, etc. */

var : VARNAME optionalseqofindexes ; /* General use of a variable in an expr. */
/* A string var may be indexed several times, e.g., 'a$( TO 8)( TO 3)( TO 1)'
Any var can have several indexes in the first parenthesis (if it is a matrix)
separated by comma, but only one in each of the following. 
If the variable is an array, it cannot have "TO" slices except in the last index
of the first parenthesis, as long as it is string; that must be checked out 
besides parsing.
This grammar cannot check whether VARNAME is not a string but a TO is used
(error); that must also be done besides parsing. */

optionalseqofindexes :	firstparindexes optionalmoreseqofindexes | /* empty */;

optionalmoreseqofindexes : moreparindexes optionalmoreseqofindexes | /* empty*/;

firstparindexes : BEGINPAR indexes ENDPAR ; 

moreparindexes : BEGINPAR index ENDPAR ;

indexes : indexorslice tailofindexes  | /* empty */ ; 

index : indexorslice | /* empty */ ;

tailofindexes :	COMMA indexorslice tailofindexes  | /* empty */ ; 

indexorslice : indexorslice1 | indexorslice2 ; 

indexorslice1 : expression restofindexofslice1 ; 

indexorslice2 : TO restofrestofindexofslice ; /* Slicing beginning with TO */

restofindexofslice1 : TO restofrestofindexofslice | /* empty */ ; 

restofrestofindexofslice : expression | /* empty */ ; 



// ---------------- MID LEVEL: EXPRESSIONS

topleveloptexpr : toplevelexpr | /* empty */ ; 

toplevelexpr : expression ; 
/* Cannot be nested, unlike expression */

toplevelexprnonvar : expressionnonvar ; 
/* Cannot be nested, unlike expressionnonvar */

listofexprs : expression restoflistofexprs ; 

restoflistofexprs : COMMA listofexprs  | /* empty */ ; 

expression : highprecexpr restofexpr ; 
/* We cannot distinguish between numeric and string expressions in this grammar
because the DATA statement needs a list of expressions with mixed types
(numeric / string), and that would make the grammar non-LL(1). Therefore, the
semantic parsing is in charge of detecting forbidden expressions (operations
not allowed on numeric or on strings, for instance). */

expressionnonvar : highprecexprnonvar restofexpr ;

highprecexpr : highprecexprnonvar
		| var ; 

highprecexprnonvar : NUMLIT /* Compiled no. embedded in terminal */
		| STRINGLIT optionalmoreseqofindexes /* slice/index a str literal */
		| BEGINPAR expression ENDPAR optionalmoreseqofindexes /* idem */
		| MINUS highprecexpr /* unary minus */
		| PLUS highprecexpr /* unary plus */
		| FN usrfunctioncall 
		| sysfunctioncall ;

restofexpr : op expression | /* empty */ ;

op : 	EXPON |	MULT | DIV
	  | PLUS | MINUS /* binary minus and plus */
	  | EQUAL | GREAT | LESS | GREATEQ | LESSEQ | NOTEQ 
	  | AND | OR ;
    
/* In the ZX, the sys functions have higher priority than the rest of the 
expression where they are */

sysfunctioncall : RND 
		| PI 
		| CODE highprecexpr
		| VAL highprecexpr 
		| LEN highprecexpr 
		| SIN highprecexpr 
		| COS highprecexpr 
		| TAN highprecexpr 
		| ASN highprecexpr 
		| ACS highprecexpr 
		| ATN highprecexpr 
		| LN highprecexpr 
		| EXP highprecexpr 
		| INT highprecexpr 
		| SQR highprecexpr 
		| SGN highprecexpr 
		| ABS highprecexpr 
		| PEEK highprecexpr 
		| IN highprecexpr 
		| NOT highprecexpr 
		| BIN NUMLIT
		| USR highprecexpr	
		| POINT BEGINPAR expression COMMA expression ENDPAR 
		| ATTR BEGINPAR expression COMMA expression ENDPAR 
		| INKEY 
		| SCREEN BEGINPAR expression COMMA expression ENDPAR  
		| VAL_STR highprecexpr 
		| STR highprecexpr 
		| CHR highprecexpr ; 

usrfunctioncall : VARNAME BEGINPAR fnoptionallistofexprs ENDPAR  ; 

fnoptionallistofexprs :	fnlistofexprs  | /* empty */ ; 

fnlistofexprs : expression fnrestoflistofexprs ; 

fnrestoflistofexprs : COMMA fnlistofexprs  | /* empty */ ; 

} // end of rules

In the following there is an example of the resulting AST from a very simple program (graphical tree drawn by yEd). Here you can appreciate the complexity of the tree in spite of the simplicity of the code. A depth-first traversal of the leaves of the tree will produce the original code.

[Download]

In this link you can download the version v2.0.0 of the grammar.

[Contact]

This grammar has been written and tested by Juan-Antonio Fernández-Madrigal in Autumn 2018 / Spring 2019.

If you are interested in contacting me, you can use "software" (remove quotes) at jafma.net.