
I'd also recommend against Yacc. As others have said, it's a great tool and very powerful but a recursive descent parser will do the job 99% of the time and will be much easier. Writing a good unambiguous grammar for Yacc can be tricky and is much more difficult to debug than a recursive descent parser. A lot of languages now have "parser combinator" libraries which make it even easier to write a recursive descent parser. I'd do a google search to see if there's one available for your programming language of choice. Unfortunately I don't have any tutorial recommendations I still have my compiler textbook from university (which is probably way more information than you need) so I haven't personally needed the sort of tutorial you're looking for. On the other hand (and no promises), I've been trying to start a blog so if you can't find a good tutorial on recursive descent parsing let me know and I *might* find the time to write one myself. Matt Gordon On Sun, Sep 16, 2018 at 1:47 PM D. Hugh Redelmeier via talk <talk@gtalug.org> wrote:
Ater advising against YACC, I thought I should promote it a bit.
YACC uses a formal declarative system for specifying a language grammar (Backus-Naur Form). This has a number of nice features:
- BNF is very well described and extensively used in the literature
- it was invented to describe the programming language Algol 60. That document is one of the classics of computer science and is still a must-read. Here's a copy: <https://www.masswerk.at/algol60/report.htm>
- many bastardizations of BNF have been used. The real thing is better than most of its successors.
- a BNF grammar is a context-free grammar (Chomsky's term. Yes, that Noam Chomsky)
- if a grammar is ambiguous, YACC will tell you. Not at runtime but at table-building time. This is really really useful because it is very easy to inadvertently create an ambiguous grammar -- generally a Bad Thing. Informal recursive descent parsers never detect such problems.
This feature is especially useful for those still learning about language design.
- YACC has features to resolve ambiguities. They are short-cuts that cloud the issues and I think that they are a Bad Thing.
- an LR(k) grammar (invented by Knuth before LALR) means that a deterministic Left to Right single-pass parser (i.e. one without any backtracking) can "recognize" the language with only a k-symbol look-ahead. LALR(k) is a subset of LR(k) for which it is known how to generate an efficient parser. In practical terms, k should be 1.
- when given a choice between a declarative and a procedural model, always at least consider declarative. Declarative is much easier to reason about as the system gets even a little complicated.
One learns a lot about language design by writing a BNF grammar and debugging it through YACC.
lex is based on some theory (Chomsky Type 0 (Regular) languages) but is more ad hoc. --- Talk Mailing List talk@gtalug.org https://gtalug.org/mailman/listinfo/talk