I'd also recommend against Yacc. As others have said, it's a great tool and very powerful but a recursive descent parser will do the job 99% of the time and will be much easier. Writing a good unambiguous grammar for Yacc can be tricky and is much more difficult to debug than a recursive descent parser. A lot of languages now have "parser combinator" libraries which make it even easier to write a recursive descent parser. I'd do a google search to see if there's one available for your programming language of choice.

Unfortunately I don't have any tutorial recommendations I still have my compiler textbook from university (which is probably way more information than you need) so I haven't personally needed the sort of tutorial you're looking for.

On the other hand (and no promises), I've been trying to start a blog so if you can't find a good tutorial on recursive descent parsing let me know and I *might* find the time to write one myself.

Matt Gordon


On Sun, Sep 16, 2018 at 1:47 PM D. Hugh Redelmeier via talk <talk@gtalug.org> wrote:
Ater advising against YACC, I thought I should promote it a bit.

YACC uses a formal declarative system for specifying a language
grammar (Backus-Naur Form).  This has a number of nice features:

- BNF is very well described and extensively used in the literature

- it was invented to describe the programming language Algol 60.
  That document is one of the classics of computer science
  and is still a must-read.  Here's a copy:
        <https://www.masswerk.at/algol60/report.htm>

- many bastardizations of BNF have been used.  The real thing is better
  than most of its successors.

- a BNF grammar is a context-free grammar (Chomsky's term.  Yes, that
  Noam Chomsky)

- if a grammar is ambiguous, YACC will tell you.  Not at runtime but
  at table-building time.  This is really really useful because it is
  very easy to inadvertently create an ambiguous grammar -- generally
  a Bad Thing.  Informal recursive descent parsers never detect such
  problems.

  This feature is especially useful for those still learning about
  language design.

- YACC has features to resolve ambiguities.  They are short-cuts that
  cloud the issues and I think that they are a Bad Thing.

- an LR(k) grammar (invented by Knuth before LALR) means that a
  deterministic Left to Right single-pass parser (i.e. one without any
  backtracking) can "recognize" the language with only a k-symbol
  look-ahead.  LALR(k) is a subset of LR(k) for which it is known how
  to generate an efficient parser.  In practical terms, k should be 1.

- when given a choice between a declarative and a procedural model,
  always at least consider declarative.  Declarative is much easier to
  reason about as the system gets even a little complicated.

One learns a lot about language design by writing a BNF grammar and
debugging it through YACC.

lex is based on some theory (Chomsky Type 0 (Regular) languages) but
is more ad hoc.
---
Talk Mailing List
talk@gtalug.org
https://gtalug.org/mailman/listinfo/talk