BLD_doc.html

BLD_doc.grm

Generalities
Implementation conventions
- Tokenizing
  - Recognized tokens
  - Important notes regarding lexical analysis
- Parsing
  - Recognized PS constructs
  - Important notes regarding syntax analysis
The raw BNF's
XML serialization annotations

Generalities

This is a Jacc grammar for the so-called Presentation Syntax (PS) of the RIF Basic Logic Dialect (BLD) developed as a result of the activities of the RIF Working Group. This HTML file is the root of a hyperlinked documentation allowing one to explore the Jacc grammar for BLD via navigation through its elements - rules, terminal symbols, and non-terminal symbols. The comments accompanying some rules come from the original documents. It also contains the pure Yacc rules (i.e., without semantic actions), and the XML serialization mappings. This documentation is generated by Jacc from the Jacc grammar specified in file BLD.grm (i.e., with the command "jacc -doc BLD"). Along with this Jacc grammar file, there are other supporting source files.

This Jacc grammar is a transcription of the EBNF for the canonical syntax of the RIF BLD. This syntax is canonical in that this EBNF defines the kernel constructs used for the BLD-to-XML transformation rules. In addition to the canonical BLD PS language, it has been proposed to allow a simpler syntax for writing RIF use cases. This simpler syntax extends the canonical syntax by allowing various shorthands for RIF constants and for common expressions such as arithmetic, etc. - the so-called Abridged PS. This additional syntax is not canonical PS in that it is just syntactic sugar that is desugared into the canonical form.

Implementation conventions

The Jacc grammar specification given here is a literal transcription of the BNF rules given in the above references, adapted to the need of the Jacc format. There are two sets of grammar rules:

the rules for the Basic Logic Rule language (BLR); and,
the rules for the Basic Logic Condition (BLC) language.

Tokenizing

(N.B.: See the source file of the tokenizer Tokenizer.java.)

Recognized tokens

The terminal symbols are:

*Token*	*Value*
`OPENPAR`	`'('`
`CLOSEPAR`	`')'`
`OPENBRA`	`'['`
`CLOSEBRA`	`']'`
`OPENMETA`	`'(*'`
`CLOSEMETA`	`'*)'`
`DOCUMENT`	`'Document'`
`BASE`	`'Base'`
`PREFIX`	`'Prefix'`
`IMPORT`	`'Import'`
`GROUP`	`'Group'`
`EXTERNAL`	`'External'`
`AND`	`'And'`
`OR`	`'Or'`
`EXISTS`	`'Exists'`
`FORALL`	`'Forall'`
`IF`	`':-'`
`ARROW`	`'->'`
`LEXSPACE`	`'^^'`
`EQUAL`	`'='`
`MEMBER`	`'#'`
`SUBCLASS`	`'##'`
`COLON`	`':'`
`NUMBER`	(possibly signed) integer, decimal, or floating-point
`VARIABLE`	maximum-length of word characters starting with a `'?'`
`LOCALNAME`	maximum-length of word characters starting with a `'_'`
`STRING`	a double-quoted string containing any character (using `'\'` to escape `'"'`)
`IDENTIFIER`	maximum-length of word characters starting with a letter

Important Notes

Important notes regarding lexical analysis

NUMBER is a token representing numbers.
VARIABLE is a token recognized thanks to its leading '?' but the token returned by the lexer suppresses this leading '?'. This means that '?' is not a separate punctuation mark as shown by the BLC language's EBNF.
LOCALNAME is a token recognized thanks to its leading '_' but the token returned by the lexer suppresses this leading '_'. This means that '_' is not a separate punctuation mark as shown by the DTB language's EBNF for shorthands.
Using STRING dispenses from the spurious "..."^^ notation, making '^^' an infix operator. In other words, the initial and final double quotes are part of the token itself and need not appear at the grammar level.
IDENTIFIER is any maximal sequence of non-separator not-punctuation unicode characters that does not start with a '?'.
Note that the colon character (':') is tokenized as punctuation. Indeed, a SymSpace is parsed as a pair of IDENTIFIERs separated by a colon. This means that ':' is a separate punctuation mark unlike what is shown by the EBNF.

The above conventions have been reached after a careful analysis of the various notions of what constitutes a constant in RIF BLD.

In the RIF specification of the EBNF the Rule Language, it is specified that:

  IRIMETA        ::= '(' IRICONST? (Frame | 'And' '(' Frame ')')? ')'
  Frame          ::= TERM '[' (TERM '->' TERM) ']'
  TERM           ::= IRIMETA? (Const | Var | Expr | 'External' '(' Expr ')')
  Const          ::= '"' UNICODESTRING '"^^' SYMSPACE | CONSTSHORT
  SYMSPACE       ::= ANGLEBRACKIRI | CURIE

where CONSTSHORT, ANGLEBRACKIRI, and CURIE are defined (in the DTB shorthand notation for RIF constants) by:

  CURIE         ::= PNAME_LN | PNAME_NS
  CONSTSHORT    ::= ANGLEBRACKIRI         // shorthand for "..."^^rif:iri
                  | CURIE                 // shorthand for "..."^^rif:iri
                  | '"' UNICODESTRING '"' // shorthand for "..."^^xs:string
                  | NumericLiteral        // shorthand for "..."^^xs:integer,xs:decimal,xs:double
                  | '_' LocalName         // shorthand for "..."^^rif:local

where:

  ANGLEBRACKIRI ::= '<' ([^<>"{}|^`]-[#x00-#x20]) '>'
  PNAME_LN      ::= PNAME_NS PN_LOCAL
  PNAME_NS      ::= PN_PREFIX? ':'
  PN_LOCAL      ::= (PN_CHARS_U | [0-9]) ((PN_CHARS|'.') PN_CHARS)?
  PN_PREFIX     ::= PN_CHARS_BASE ((PN_CHARS|'.') PN_CHARS)?
  PN_CHARS_U    ::= PN_CHARS_BASE | '_'
  PN_CHARS      ::= PN_CHARS_U
                  | '-'
                  | [0-9]
                  | #x00B7
                  | [#x0300-#x036F]
                  | [#x203F-#x2040]
  PN_CHARS_BASE ::= [A-Z]
                  | [a-z]
                  | [#x00C0-#x00D6]
                  | [#x00D8-#x00F6]
                  | [#x00F8-#x02FF]
                  | [#x0370-#x037D]
                  | [#x037F-#x1FFF]
                  | [#x200C-#x200D]
                  | [#x2070-#x218F]
                  | [#x2C00-#x2FEF]
                  | [#x3001-#xD7FF]
                  | [#xF900-#xFDCF]
                  | [#xFDF0-#xFFFD]
                  | [#x10000-#xEFFFF]

The PS grammar's tokenizing is complexified due to not using double-quoted strings around the IRI's that are arguments of the pragmas Prefix and Base, which declare shorthands for IRI's. The alternative would be to parse IRI's - which is beyond our prototype's goal, besides being unnecessary in this case. This is not so in the canonical PS, where all such IRI's are double-quoted strings - which greatly simplifies the tokenizing. It's as simple and as easy to do so for the Prefix and Base pragmas - which is what our prototype does.

Parsing

Recognized PS constructs

Important notes regarding syntax analysis

The raw BNF's

The two grammars for the BLD (Condition and Rule) languages expressed in Yacc form are given below.

BLD Rule Language

The original EBNF is accessible in the specification of the BLD Rule Language. It is reproduced here for convenience:

  Document  ::= IRIMETA? 'Document' '(' Base? Prefix* Import* Group? ')'
  Base      ::= 'Base' '(' IRI ')'
  Prefix    ::= 'Prefix' '(' Name IRI ')'
  Import    ::= IRIMETA? 'Import' '(' IRICONST PROFILE? ')'
  Group     ::= IRIMETA? 'Group' '(' (RULE | Group)* ')'
  RULE      ::= (IRIMETA? 'Forall' Var+ '(' CLAUSE ')') | CLAUSE
  CLAUSE    ::= Implies | ATOMIC
  Implies   ::= IRIMETA? (ATOMIC | 'And' '(' ATOMIC* ')') ':-' FORMULA
  PROFILE   ::= TERM

The Jacc rules corresponding to this EBNF are given in BLR.grm.

BLD Condition Language

The original EBNF is accessible in the specification of the BLD Condition Language. It is reproduced here for convenience: FORMULA ::= ATOMIC | IRIMETA? 'And' '(' FORMULA* ')' | IRIMETA? 'Or' '(' FORMULA* ')' | IRIMETA? 'Exists' Var+ '(' FORMULA ')' | IRIMETA? 'External' '(' Atom | Frame ')' ATOMIC ::= IRIMETA? (Atom | Equal | Member | Subclass | Frame) Atom ::= UNITERM UNITERM ::= Const '(' (TERM* | (Name '->' TERM)*) ')' Equal ::= TERM '=' TERM Member ::= TERM '#' TERM Subclass ::= TERM '##' TERM Frame ::= TERM '[' (TERM '->' TERM)* ']' TERM ::= IRIMETA? (Const | Var | Expr | 'External' '(' Expr ')') Expr ::= UNITERM Const ::= '"' UNICODESTRING '"^^' SYMSPACE | CONSTSHORT Name ::= UNICODESTRING Var ::= '?' UNICODESTRING SYMSPACE ::= ANGLEBRACKIRI | CURIE IRIMETA ::= '(*' IRICONST? (Frame | 'And' '(' Frame* ')')? '*)'
The Jacc rules corresponding to this EBNF are given in BLC.grm.

Additional ad hoc rules

Some Jacc rules corresponding to temporary ad hoc implementation decisions for the sake of prototyping are given in AdHoc.grm.

XML serialization annotations

This version of the BLD grammar is annotated for simple XML serialization as per the scheme specified in the current BLD document. Each XML serialization annotation generates an HTML documentation file accessible by navigating through the grammar (e.g., that of the rule for Group). The effects of such annotations are summarized in the table of XML serialization mappings.

Essentially, the format of a Jacc grammar is that of a Yacc grammar. As in Yacc, Jacc rules may be annotated with semantic actions in the form of Java code involving the rule's RHS constituents (denoted by $1, $2, ..., $n - the so-called pseudo-variables where the index n in $n refers to the order of RHS constituents. Such actions appear between curly braces ('{' and '}') wherever a symbol may appear in a rule's RHS. Jacc also allows an additional form of annotation in the RHS of a rule to indicate the XML serialization pattern of the abstract syntactic tree (AST) node corresponding to a derivation with this rule. This XML serialization meta-annotation comes between square brackets ('[' and ']') and is of the form described in a simple XML serialization annotation language.

For example, the annotated rule:

  QUANTIF
     : 'Exists' Var_plus '(' CONDIT ')'
     [
         nsprefix   : hrl
         localname  : quantifier
         attributes : {kind="existential"}
         children   : (2,4)
     ]
     ;

means that an AST node for this rule will be serialized thus:

   <hrl:quantifier kind="existential">
     (XML serialization of Var_plus)
     (XML serialization of CONDIT)
   </hrl:quantifier>

Rules without XML serialization annotation follow a default behavior: the serialization is the concatenation of those of its RHS's constituents, eliminating punctuation tokens; i.e., empty nodes and literal tokens - namely, tokens that do not carry a value. (See the Jacc XML annotation manual for more details.)

For example, see the two test files examples/Test1.bld and examples/Test2.bld. Running the command examples/bld on them produces the XML trees shown in examples/Test1.xml and examples/Test2.xml.

This file was generated on Mon Nov 17 15:35:41 PST 2008 from file BLD_doc.grm
by the ilog.language.tools.Hilite Java tool written by Hassan Aït-Kaci