These notes describe a simple annotation system meant to enable the current implementation of Jacc to support the generation of an XML-serializer for a language from its grammar. Such a tool's purpose is to produce an XML tree representation of a parsed program (i.e., a correctly parsed element of the language generated by a Jacc grammar) by exploiting simple optional annotations that may be added to appropriate rules.
Jacc also allows an additional form of annotation in the RHS of a rule to indicate the XML serialization pattern of the abstract syntactic tree (AST) node corresponding to a derivation with this rule. This XML serialization meta-annotation comes between square brackets ('[' and ']').
The most basic form of XML annotation is of the form:
[ XmlTag n_1 ... n_k ]where
XmlTag is an identifier to use as XML tag for this
node in the XML serialization of the AST, and the n_k's
are a sequence of numbers denoting positions of symbols in the RHS of
the rule (as for pseudo-variables but without '$'). The
number sequence indicates the order in which subnodes are to be
serialized.
For example, the annotated rule:
QUANTIF
: 'Exists' Var_plus '(' CONDIT ')'
[ Exists 2 4 ]
;
means that an AST node for this rule will be serialized thus:
<Exists> (Xml serialization of Var_plus) (Xml serialization of CONDIT) </Exists>Rules without XML serialization annotation follow a default behavior: the serialization is the concatenation of those of its RHS's constituents, eliminating punctuation tokens (i.e., empty nodes and literal tokens - namely, tokens that do not carry a value).
Using the above basic form and trusting the default bevavior for generating well-bracketted tag pairs is sufficient for simple serialization. However, whenever attributes are needed, or when the serialized AST form is not strictly homomorphic to the concrete syntax tree's, one needs more complex forms of annotation.
The current version of the implementation supports only the basic scheme and does not actually ever build an XML tree. It simply builds a subtree of the CST following the annotation pattern at each rule as it is reduced. It then outputs each node as an XML element using the tag recorded in each parse node recursively.
The first approach is sufficient if all the information needed for the synthesis of an XML element is available in the partial subtree available as the CST is being built in a bottom-up manner. However, it is not sufficient if information from siblings or ancestors is needed, since these are not existent at rule reduction time. (This is similar to the situation between synthesized and inherited attributes in attribute grammars.)
Since the annotation we will propose below for a node's XML representation relies exclusively on information from the CST subtree below a node and never on ancestors or siblings, we shall opt for the one-pass incremental bottom-up tree-building solution.
Jacc works from the leaves to the root (each CST tree node being a parse node - i.e., a ilog.language.syntax.ParseNode object). Thus, in order to build an XML AST from the CST, we proceed inductively from the parse nodes containing tokens, and work our way up at each rule reduction, building the xml tree as a field of the ilog.language.syntax.ParseNode standing for the reducing rule's LHS from the xml trees of its RHS.
In this way, we separate the parse node from its DOM compliant version. Hence, we will associate to each parse node a private instance variable _xmlNode of type org.dom4j.Document that will be set to a DOM compliant XML element representation of this parse node.
The annotation are per rule. The parse node corresponding to a rule's LHS symbol corresponds to the xml element specified by the annotation (if there is one) or the default value as per the default. Thus, the xml element of interior nodes may be built at rule reduction time.
For leaf nodes, however, the xml element is specified as the