Jacc's Yacc-Like Commands

Jacc's yacc-like Commands

This describes the commands used by Jacc (a.k.a., Just another compiler compiler). The declaration section of a grammar's specification uses commands. All commands are optional and their effects are as follow:

Conventional yacc commands:

%start

Specifies the grammar's start symbol. The default is the first declared root if there is one; otherwise, the first rule's LHS.

%token

Declares one or several tokens with the same precedence level and equal unspecified associativity. The format is the following:

%token [looseness] token₁ [... token_n]

(where the square brackets mean "optional"). The looseness must be an integer in the range [1..1200]. If specified, it will be used to set the precedence of the declared tokens. If it is not specified, the precedence is implicitly set to a higher one than the last implicit precedence value used in a previous similar declaration. Jacc uses looseness values in the range [1..1200] following Prolog's binding looseness convention. The higher this looseness number, the lesser the precedence.

%left

Declares one or several left-associative tokens of equal precedence level. The format is the following:

%left [looseness] token₁ [... token_n]

%right

Declares one or several right-associative tokens of equal precedence level. The format is the following:

%right [looseness] token₁ [... token_n]

%nonassoc

Declares one or several non-associative tokens of equal precedence level. The format is the following:

%nonassoc [looseness] token₁ [... token_n]

%{

Beginning of part of the Parser's class (Java code).

%}

End of part of the Parser's class (Java code).

Jacc's additional commands:

In addition to the commands above, the following new commands are available. All are optional as well:

%access

Declares an access tag for the generated parser class. It must be one of public, private, or protected. (Defaults to none - i.e., package access.)

%package

Declares a package name for the generated parser. (Defaults to none.) This is inherited by all the node classes that are declared public (see %nodeclass).

%import

Declares one or many classes or packages to import by the generated parser's class. It can be used many times, or with several arguments separated by spaces or semicolons. This information is inherited by all the node classes that are declared public (see %nodeclass).

%root

Declares a nonterminal to be a partial parse root. This generates two methods in the parser to parse the partial unit. More precisely, declaring:

%root foo

will generate the following methods in the parser:

    public final void parseFoo (String) throws IOException;

    public final void parseFoo (java.io.Reader) throws IOException;

which can be used for the obvious purpose. One can always use the normal parse(), parse(boolean), or parse(int) methods for full parsing. The start symbol (i.e., declared with the %start command) is always implicitly declared as a partial parse unit. Hence, the two partial parse methods are always generated for it. Note that using these partial parsing methods disables error recovery. This is because of a technicality; namely, error recovery may put the parsing automaton in a state that does not belong to a sublanguage's parser. Therefore, an error encountered while performing a partial parse is always fatal (i.e., a FatalParseErrorException is thrown). This makes sense as these parses are meant to be done on relatively short strings.

For an example of use of partial parsing, we use the calculator grammar, where two partial parsing units are declared (i.e., expression and definition). The tokenizer and the driver for the full calculator remain unchanged. But now, we also can define a partial Calculator application. Here is its resulting output. Note that it uses the same tokenizer.

%nodeclass

Declares a new node class extending the default one for the object pushed on the parser's stack. The format is of the following form:

%nodeclass [ public ] foo [ extends bar ] [ implements interface₁ [ , ... , interface_n ] ] [ locates buz ]
{ [ <Java class member declarations> ] }

(where the square brackets mean "optional"). The parameters foo and bar must be the names of nonterminal symbols, and buz the name of an attribute of this node class (defined here or inherited) that must be of a type implementing the Locatable interface.

This declaration defines a Java class whose name is $foo$ and body is as specified. If an extends clause is missing, the class $foo$ will be a direct subclass of class ParseNode; otherwise, it will be a direct subclass of the node class $bar$ , which must itself be also declared with a %nodeclass command (missing such declarations are eventually reported as errors). If an implements clause is present, the class $foo$ will be declared to implement all the specified interfaces (i.e., the body of the %nodeclass should contain the appropriate definitions implementing them). Unless the public option is specified, this class is written as part of the parser's file. If it is a public class, it is written as a separate file named $foo$.java (as mandated by Java). One may change the $...$ class name bracketing used by default, with one, or both, of the commands %nodeprefix and %nodesuffix. If a locates clause is present, the location of this node class will be automatically transmitted to the Locatable attribute it indicates.

%nodeprefix

Declares a string of characters to prepend the class name generated by a %nodeclass declaration. The default is "$".

%nodesuffix

Declares a string of characters to append to the class name generated by a %nodeclass declaration. The default is "$".

%xmlinfo

Associates an XML serialization pattern to a terminal symbol used by a leaf parse node in the concrete syntax tree to be transduced into a JDOM XML tree element. The format is of the following form:

%xmlinfo symbol [ annotation ]

where the first argument is a (possibly single- or double-quoted) string, and such that the square brackets are a required part of the command enclosing an XML serialization annotation, which is allowed to run over several lines as needed. The annotation follows a notation of the form:

nsprefix : prefix

localname : name

attributes : { attr₁ = value₁ , ... , attr_n = value_n }

children : ( i₁ , ... , i_n )

All entries are optional except for the localname entry. This notation is described in more details here.

%xmlroot

Declares the XML tree's root element's name and namespace prefix. The format is of the following form:

%xmlroot [ nsprefix ] localname

(where the square brackets mean "optional"). Both argument are (possibly single- or double-quoted) strings.

%xmlns

Declares an XML namespace and its prefix for the XML tree's root element. The format is of the following form:

%xmlns nsprefix uri

The first argument is a (possibly single- or double-quoted) string. The second must be a (single- or double-) quoted string.

%import

%include

Indicates that the specified file name is to be included at this point. This can be used several times and the inclusion will respect the order. An included file may contain %includes. Circular inclusion is detected and flagged as an error. This command can also be used in the <classes> section of the grammar to include ancillary classes or programs defined in other files.

%usefile

Deprecated - use %include instead.

%precstep

Declares an integer (possibly negative!) by which to increment the precedence level for each new set of token declared in a single %token, %left, %right, %nonassoc.

%dynamic

For certain grammars, parsing may need to behave non-deterministically in certain situations. This is enabled by using the %dynamic command with, or without, an argument:

Without argument, this command is used to indicate that ambiguous tokens and/or parse actions are handled properly. This is useful when tokens may belong to distinct lexical categories or when, despite one's best efforts, there remain statically unresolvable S/R or R/R conflicts due to inherent grammar ambiguities (i.e., even using precedence and associativity information).
The %dynamic command may also take one argument: the name of a nonterminal symbol denoting a dynamic operator token category (read the specification for details). Several occurrences of the %dynamic command may appear in a grammar, each with a different argument.

In addition, dynamic operator commands may be used if defined by the %dynamic command. For example, declaring:

%dynamic op

enables the %op command to be used to define dynamic operators of this category (for more details, read the section on declaring dynamic operators in the specification).

Author:: Hassan Aït-Kaci
Version:: Last modified on Wed Aug 21 18:26:23 2019 by hak

`nsprefix`	`:`	`prefix`
`localname`	`:`	`name`
`attributes`	`:`	`{` `attr₁` `=` `value₁` `,` ... `,` `attr_n` `=` `value_n` `}`
`children`	`:`	`(` `i₁` `,` ... `,` `i_n )`