
Ater advising against YACC, I thought I should promote it a bit. YACC uses a formal declarative system for specifying a language grammar (Backus-Naur Form). This has a number of nice features: - BNF is very well described and extensively used in the literature - it was invented to describe the programming language Algol 60. That document is one of the classics of computer science and is still a must-read. Here's a copy: <https://www.masswerk.at/algol60/report.htm> - many bastardizations of BNF have been used. The real thing is better than most of its successors. - a BNF grammar is a context-free grammar (Chomsky's term. Yes, that Noam Chomsky) - if a grammar is ambiguous, YACC will tell you. Not at runtime but at table-building time. This is really really useful because it is very easy to inadvertently create an ambiguous grammar -- generally a Bad Thing. Informal recursive descent parsers never detect such problems. This feature is especially useful for those still learning about language design. - YACC has features to resolve ambiguities. They are short-cuts that cloud the issues and I think that they are a Bad Thing. - an LR(k) grammar (invented by Knuth before LALR) means that a deterministic Left to Right single-pass parser (i.e. one without any backtracking) can "recognize" the language with only a k-symbol look-ahead. LALR(k) is a subset of LR(k) for which it is known how to generate an efficient parser. In practical terms, k should be 1. - when given a choice between a declarative and a procedural model, always at least consider declarative. Declarative is much easier to reason about as the system gets even a little complicated. One learns a lot about language design by writing a BNF grammar and debugging it through YACC. lex is based on some theory (Chomsky Type 0 (Regular) languages) but is more ad hoc.