|
[Description generated from file XmlAnnotationSpecification.grm]
Including file XmlAnnotationNodeClasses.grm defining some fields and methods for some non-terminal nodes used in the grammar's semantic actions. A WrapperPath_opt stores the sequence of wrappers making up a WrapperPath in an ArrayList called wrapperPath. A ChildXmlTree has the following fields:
A XmlChildSpec is a ChildXmlTree with the following additional field:
A XmlTreeRef is a XmlChildSpec with the following additional field:
An AttributeValue is a special case of XmlTreeRef. It has a flag hasLiteralValueContent indicating whether it carries a literal value; when this flag is true, then the literal value is in the value field. An XmlPath_opt stores the sequence of numbers making up a XmlPath in an IntArrayList called xmlPath. Including file XmlAnnotationSpecification.grm containg the documentation describing the logical design for Jacc's XML serialization annotation notation. [See also the Java sources] We need a means to annotate a Jacc grammar so as to ease and automate the process of specifying an XML serialization for the language defined by the grammar. The way we proceed is by annotating some rules and terminals to produce an XML form built out of those XML forms built for the constituents of the CST (i.e., from a terminal's contents or a rule's RHS). This specifies a grammar for a simple annotation language meant to enable passing XML formatting information from a Jacc grammar to a Jacc parser. This defines the form of what goes between square bracketed arguments of the xmlinfo command annotating a terminal, or appearing in a rule being annotated for XML conversion (i.e., for serialization purposes). Doing this gives us great flexibility for extending or modifying the annotation meta-syntax simply by:
SpecificationBasic annotation notationWe first introduce the basic annotation notation for the very common case when the XML tree to be constructed from the CST is homomorphic to the CST in that it only needs information that is local to the CST node. We will extend this notation later when the tree construction is heteromorphic, needing information from below this node.
In order for the parser of the annotation notation to stay
small and light-weight, as well as avoiding ambiguity and
stay strictly within LALR(1) recognition power, we will
adopt the following very simple keyword-driven syntax. For
example:
nsprefix
localname
attributes
children
Such an admissible keyword is followed by a value, which may be
either an identifier, a single- or double-quoted string, or a list
between curly braces {...}, or parentheses (...),
the nature of this list's brackets and elements depending on the
keyword. See the grammar rules for
details.
N.B.: The annotation is meant to be light-weight - all these keywords may be abbreviated to any non-empty case-insensitive prefix of their full form, and some punctuation may be used interchangeably or simply omitted: e.g., the ":" separating keywords and values, the "," separating list elements, as well as unnecessary quotes, are all in fact optional. The following key/value separator symbols may be used: ":", "=", "->", "=>", or they may be simply omitted. Similarly, the following list separator symbols may be used: "," (comma), ";" (semicolon), or they may be simply omitted. See the associate tokenizer class: XmlAnnotationTokenizer.java for details.
For example, the above same annotation could equally be
written as follows:
More complex annotation notationThe simple notation above is all one needs in many common cases: it works whenever the XML serialization pattern is constructible only from the immediate constituents of the rule's LHS (A0) - i.e., the XML trees of the rule's RHS symbols (n>0). It is, however, insufficient for expressing XML serialization patterns that depend on sub-elements contained within those of the XML serialization of the RHS symbols. The simple case is called homomorphic tree transduction, while the more complex case is called heteromorphic tree transduction. [NB: "homo-morphic" = "of similar form" (from the Greek "homo-morfos", meaning "same shape"), and "hetero-morphic" = "of dissimilar form" (from the Greek "hetero-morfos", meaning "different shape").]A more elaborate XML annotation notation extends the above basic notation by allowing the values of attributes and children in the annotation to take on more complex forms denoting a reference to the desired XML constructs within the XML trees already built for the CST children of this node. Following are some simple color-coded examples illustrating the meaning of these annotations, showing how the basic notation for homomorphic tree-transduction annotations for attribute and children is extended to express heteromorphic tree-transduction as well. Children annotation:
Attributes annotation:
Interpreted special forms:In addition to the above notation (and default behavior), we provide the following conveniences to specify finer details on the XML appearance from the information present in the CST thanks to the following built-in special forms, which all starting with a dollar sign '$', followed by the (case-insensitive) form identifier and possible arguments between parentheses and separated by a legal list separator; namely, blank space, "," (comma), or ";" (semicolon).
Checking Annotation ConsistencyWe need to enforce consistent number referencing in the tree addresses used in the notation - i.e., the numbers that refer to RHS nodes and XML elements (the ci's and the xi's below). Indeed, they should (be made to) obey the following necessary conditions (all easy to justify):
If all these conditions hold, then the code for the method xmlify(Element container) defined in the class ParseNode, and the method createXmlForm(ParseNode node, Element root) defined in the class XmlInfo, is guaranteed to work safely. DTD/Schema extractionNote that when all annotations are consistent, we may wish to extract more static information from the annotated grammar. It is indeed possible to infer the global nature of the admissible XML documents generated from a specific annotated grammar at parser-generation time using simple static analysis of the grammar. From this we may then generate a DTD or an XML Schema describing the type of XML documents produced from serializing well-formed syntactic units. This may then be optionally adjoined to the produced XML document as a seal of verifiable well-formedness.
Extracting the types of XML elements from annotationsFor verifying a property such as [Condition 2] above, it is necessary to know the XML "element type" of the referenced XML node. This "type", of the form nsPrefix:localName, may be computed statically by analyzing the grammar's annotations, and deriving the exact XML element "type" for each tree reference in the annotations. This is done as follows.To each grammar symbol A, we associate its _xmlFormType: a RegularExpression denoting the set of possible XML element types that A may expand into when serialized into its XML form:
Computing a DTD by fix-point closure of regular expressionsThe idea is simply to iterate the above step until a fix point is reached (i.e., until no change occurs in any of the computed _xmlFormType's from one iteration to the next). Therefore, in order to enable effective comparison of RE's, the _xmlFormType's are kept in normal form.ExampleConsider the following annotated grammar:
Sexp : Atom
| '(' List ')' [ LO: List CH: (2) ]
;
Atom : NUMBER [ LO: Number AT: {value = $VALUE} ]
| NAME [ LO: Name AT: {value = $VALUE} ]
| NIL [ LO: Nil ]
;
List : [ LO: Nil ]
| List Sexp
;
[See also XmlAnnotationParserCode.grm]
|
Copyright © 2019 by Hassan Aït-Kaci; All Rights Reserved.