Jacc:
(much more than)
Just
another
compiler
compiler
- Author:
- Hassan Aït-Kaci
- Copyright:
- © 2000 - present ILOG, S.A.
Quick Links
At first sight, Jacc may be seen as a "100% Pure Java"
implementation of an LALR(1) parser generator [1] in
the fashion of the well known UNIX tool yacc---"yet
another compiler compiler" [2]. Jacc is
in fact much more than ... just another
compiler compiler: it extends
yacc to enable the generation of flexible and efficient
Java-based parsers and provides enhanced functionality rarely available
in other similar systems.
The fact that Jacc uses yacc's metasyntax makes it
readily usable on most yacc grammar. Other Java-based parser
generators all depart from yacc's format, requiring nontrivial
metasyntactic preprocessing to be used on existing yacc grammars
(which abound in the world, yacc being by far the most popular
tool for parser generation). Importantly, Jacc is programmed in
pure Java---this makes it fully portable to all existing platforms, and
immediately exploitable for web-based software applications.
Jacc further stands out among other known parser generators,
whether Java-based or not, thanks to several additional features. The
most notable are:
- Jacc uses the most efficient algorithm known to date for
its most critical computation (viz., propagation of LALR(1)
lookahead sets). Traditional yacc implementations use the
method developed by DeRemer and Penello [3];
Jacc uses an improved method due to Park, Choe, and Chang
[4], which drastically ameliorates the one by
DeRemer and Penello. To this author's best knowledge, no other
(available) Java-based metacompiler system implements the Park,
Choe, and Chang method [5].
- Jacc allows the user to define a complete class hierarchy
of parse node classes (the objects pushed on the parse stack and that
make up the parse tree: nonterminal and terminal symbols), along with
any Java attributes to be used in semantic actions annotating
grammar rules. All these attributes are accessible directly on any
pseudo-variable associated with a grammar rule constituents
(i.e., $$, $1, $2, etc.).
- Jacc makes use of all the well-known conveniences
defining precedences and associativity associated to some terminal
symbols for resolving parser conflicts that may arise. While such
conflicts may in theory be eliminated for any LALR(1) grammar, such
a grammar is rarely completely obtainable. In that case,
yacc technology falls short of providing a safe parser for
non-LALR grammar. Yet, Jacc can accommodate any such
eventual unresolved conflict using non-deterministic parse
actions that may be tried and undone.
- Further still, Jacc can also tolerate non-deterministic
tokens. In other words, the same token may be categorized as
several distinct lexical units to be tried in turn. This allows,
for example, parsing languages that use no reserved keywords (or
more precisely, whose keywords may also be tokenized as
identifiers, for instance).
- Better yet, Jacc allows dynamically
(re-)definable operators in the style of the Prolog language
(i.e., at parse-time). This offers great flexibility for
on-the-fly syntax customization, as well as a much greater
recognition power, even where operator symbols may be ambiguous
(i.e., specified to have several precedences and/or
associativity for different arities).
- Jacc supports partial parsing. In other words, in a
grammar, one may indicate any nonterminal as a parse root. Then,
constructs from the corresponding sublanguage may be parsed
independently from a reader stream or a string.
- Jacc automatically generates a full HTML documentation of
a grammar as a set of interlinked files from /** ... */
java comments in the grammar file, including a navigatable pure
grammar in "yacc form", obtained after removing all semantic and
serialization annotations, leaving only the bare syntactic rules.
- Jacc may be directed to build a parse-tree automatically
(for the concrete syntax, but also for a more implicit form which
rids a concrete syntax tree of most of its useless information).
By contrast, regular yacc
necessitates that a programmer add explicit semantic actions for
this purpose.
- Jacc supports a simple annotational scheme for automatic
XML serialization of complex AST's. Grammar rules and
non-punctuation terminal symbols (i.e., meaning-carrying
tokens; e.g., identifiers, numbers, etc.) may be
annotated with simple XML templates expressing their XML forms.
Jacc may then use these templates to transform the
Concrete Parse Tree (CST) into an AST of radically different
structure, constructed as a JDOM
XML document. This yields a convenient declarative specification of
a tree transduction process guided by just a few simple
annotations, where Jacc's "sensible" behavior on
unannotated rules and terminals works "as expected". This greatly
eases the task of retargeting the serialization of a language
depending on variable or evolving XML vocabularies.
Using Jacc
A grammar can be specified using the usual familiar yacc syntax
with semantic actions specified as Java code. The format of the grammar
file is essentially the same as that required by
yacc, with some minor differences, and a few additional powerful
features. Not using the additional features makes it essentially similar
to the yacc format.
For instructions on how to organize a Jacc grammar, please
refer to the documentation of the grammar
format, or to the description of grammar
commands. If you wish to use Jacc, follow these simple steps. For details on how jacc
extends yacc to support Prolog-style dynamic operators, read the
specification. You may also want to peruse
the code of Jacc grammar examples.
References
For detailed explanations of most constructions and algorithms used by
this package, please refer to the following.
-
Alfred Aho, Ravi Sethi, and Jeffrey Ullman, Compilers:
Principles, Techniques, and Tools. Addison-Wesley, 1986.
(This text is also familarly known as The Dragon Book.)
-
Stephen C. Johnson, "Yacc: Yet Another Compiler Compiler,"
Computer Science Technical Report 32. AT&T Bell Labs,
Murray Hill, NJ, 1975.
(Reprinted in the 4.3BSD Unix Programmer's Manual, Supplementary
Documents 1, PS1:15. UC Berkeley, 1986.)
-
Frank DeRemer, and Thomas Penello, "Efficient computation of
lookahead sets", ACM Transactions of Programming Languages and
Systems, 4(4), pp. 615-749 (Oct. 82).
-
Joseph Park, K.M Choe, and C.H. Chang, "A new analysis of LALR
formalisms", ACM Transactions of Programming Languages and
Systems, 7(1), pp. 159-175 (Jan. 85).
-
Kwang-Moo Choe, Personal Communication
(choe@compiler.kaist.ac.kr).
Korean Advanced Institute of Science and Technology, Seoul, South
Korea (Dec. 2000).
- Version:
- Last modified on Tue Jun 10 09:36:05 2008 by hak