Jacc:
(much more than)
Just
another
compiler
compiler
- Author:
- Hassan Aït-Kaci
- Copyright:
- © 2000 ILOG, S.A.
Quick Links
At first sight, Jacc may be seen as a "100% Pure Java"
implementation of an LALR(1) parser generator [1] in
the fashion of the well known UNIX tool yacc---"yet
another compiler compiler" [2]. Jacc is
in fact much more than just another compiler compiler: it extends
yacc to enable the generation of flexible and efficient
Java-based parsers and provides enhanced functionality rarely available
in other similar systems.
The fact that Jacc uses yacc's metasyntax makes it
readily usable on most yacc grammar. Other Java-based parser
generators all depart from yacc's format, requiring nontrivial
metasyntactic preprocessing to be used on existing yacc grammars
(which abound in the world, yacc being by far the most popular
tool for parser generation). Importantly, Jacc is programmed in
pure Java---this makes it fully portable to all existing platforms, and
immediately exploitable for web-based software applications.
Jacc further stands out among other known parser generators,
whether Java-based or not, thanks to several additional features. The
most notable are:
- Jacc allows the user to define a complete class hierarchy
of parse node classes (the objects pushed on the parse stack and that
make up the parse tree: nonterminal and terminal symbols), along with
any Java attributes to be used in semantic actions annotating
grammar rules. All these attributes are accessible directly on any
pseudo-variable associated with a grammar rule constituents
(i.e., $$, $1, $2, etc.).
- Jacc makes use of all the well-known convenviences
defining precedences and associativity associated to some terminal
symbols for resolving parser conflicts that may arise. While such
conflicts may usually be eliminated for some non-LALR(1) grammar,
it is unfortunately not so rare to encounter grammars with
unresolvable conflicts. In that case, yacc falls short of
providing a safe parser for non-LALR grammar. By contrast,
Jacc can correctly accommodate any such eventual
unresolved conflict using non-deterministic parse actions
that may be tried and undone.
- Further still, Jacc can also tolerate non-deterministic
tokens. In other words, the same token may be categorized as
several distinct lexical units to be tried in turn. This allows,
for example, parsing languages that use no reserved keywords (or
more precisely, whose keywords may also be tokenized as
identifiers, for instance).
- Better yet, Jacc allows dynamically
(re-)definable operators in the style of the Prolog language
(i.e., at parse-time). This offers great flexibility for
on-the-fly syntax customization, as well as a much greater
recognition power, even where operator symbols may be ambiguous
(i.e., specified to have several precedences and/or
associativity for different arities).
- Jacc supports partial parsing. In other words, in a
grammar, one may indicate any nonterminal as a parse root. Then,
constructs from the corresponding sublanguage may be parsed
independently from a reader stream or a string.
- Jacc automatically generates a full HTML documentation of
a grammar as a set of interlinked HTML files, including a
navigatable pure grammar in "yacc form" removing all semantic and
serialization annotations and give the bare syntactic rules.
Javadoc comments in the grammar file (/** ... */) may b
used to describe any part of a grammar; these comments will
generate formatted text in the appropriate files.
- Jacc may be directed to build a parse-tree automatically
(for the concrete syntax, but also for a more implicit form which
rids a concrete syntax tree of most of its useless information).
yacc necessitates that a programmer add explicit semantic
actions for this purpose. It also supports a simple annotational
scheme for automatic XML serialization of complex AST's.
- Jacc uses the most efficient algorithm known to date for
its most critical computation (viz., propagation of LALR(1)
lookahead sets). Traditional yacc implementations use the
method developed by DeRemer and Penello [3];
Jacc uses an improved method due to Park, Choe, and Chang
[4], which drastically ameliorates the one by
DeRemer and Penello. To this author's best knowledge, no other
available system implements the Park, Choe, and Chang method [5].
Using Jacc
A grammar can be specified using the usual familiar yacc syntax
with semantic actions specified as Java code. The format of the grammar
file is essentially the same as that required by
yacc, with some minor differences, and a few additional powerful
features. Not using the additional features makes it essentially similar
to the yacc format.
For instructions on how to organize a Jacc grammar, please
refer to the documentation of the grammar
format, or to the description of grammar
commands. If you wish to use Jacc, follow these simple steps. For details on how jacc
extends yacc to support Prolog-style dynamic operators, read the
specification. You may also want to peruse
the code of Jacc grammar examples.
References
For detailed explanations of most constructions and algorithms used by
this package, please refer to the following.
-
Alfred Aho, Ravi Sethi, and Jeffrey Ullman, Compilers:
Principles, Techniques, and Tools. Addison-Wesley, 1986.
(This text is also familarly known as The Dragon Book.)
-
Stephen C. Johnson, "Yacc: Yet Another Compiler Compiler,"
Computer Science Technical Report 32. AT&T Bell Labs,
Murray Hill, NJ, 1975.
(Reprinted in the 4.3BSD Unix Programmer's Manual, Supplementary
Documents 1, PS1:15. UC Berkeley, 1986.)
-
Frank DeRemer, and Thomas Penello, "Efficient computation of
lookahead sets", ACM Transactions of Programming Languages and
Systems, 4(4), pp. 615-749 (Oct. 82).
-
Joseph Park, K.M Choe, and C.H. Chang, "A new analysis of LALR
formalisms", ACM Transactions of Programming Languages and
Systems, 7(1), pp. 159-175 (Jan. 85).
-
Kwang-Moo Choe, Personal Communication
(choe@compiler.kaist.ac.kr).
Korean Advanced Institute of Science and Technology, Seoul, South
Korea (Dec. 2000).
- Version:
- Last modified on Wed Oct 25 09:34:58 2006 by hak