Online Course for Lex/Yacc?

William Park

15 Sep 2018 15 Sep '18

5:10 a.m.

Every now and then, I come across situation where an ideal solution would have been a custom parser/interpreter, either to parse data format or to write my own sets of commands. In each case, I don't know enough to write it (I'm not Comp Sci) or don't have time. So, I end up resorting to shell, awk, python, etc. Now, I want to take some course on Lex/Yacc, finally! What online course do you recommend? -- William Park <opengeometry@yahoo.ca>

Show replies by date

D. Hugh Redelmeier

16 Sep 16 Sep

5:40 a.m.

| From: William Park via talk <talk@gtalug.org> | Every now and then, I come across situation where an ideal solution | would have been a custom parser/interpreter, either to parse data format | or to write my own sets of commands. In each case, I don't know enough | to write it (I'm not Comp Sci) or don't have time. So, I end up | resorting to shell, awk, python, etc. | | Now, I want to take some course on Lex/Yacc, finally! | What online course do you recommend? I've never used online courses. Certainly not about this stuff. So I cannot answer your question directly. What is it that you hope to learn? - theory? - practical use of these specific tools? - for what problems are these tools a good choice? What languages are you comfortable using to program? I ask because many languages have provided tools or libraries that might fully or partially address your needs. TL;DR: from here on is a long discussion of my experiences with tools like these. I don't have the habit of using lex. I write my own lexers in C. Not recommended: C is an unforgiving and unhelpful language. But I've done that for 40+ years. (And mostly in assembly languages for the decade before that.) I can do it in my sleep. I think lexing is pretty easy. But maybe that's because I've already thought about most of the choices that need to be made. Perhaps lex provides a less experienced programmers a framework that guides some choices. I don't have the habit of using yacc. yacc is a parser generator for LALR grammars. The construction of LALR parsers was first described in DeRemer's PhD thesis. But the first implementation was by Wilf LaLonde at U of T. For some reason that I don't remember, LaLonde moved for a bit to Waterloo and I (an undergrad) found his generator on the Waterloo computer (but no documentation). I played with it, reverse engineered how the generated tables worked, and started using it. So I was one of the first to use such a gizmo. It was intoxicating. I invented more and more complicated language constructs because the complexity cost me almost nothing. My particular project was to design a high-level assembler for the PDP-11: think PL/360 (Niclaus Wirth's high-level assembler for the IBM System/360). I now think that this is a mistake. Parsing is not the hardest problem in writing compilers. A hand-rolled recursive descent parser is fairly easy and allows for clearer diagnostics (diagnostics are important!). But I did learn a lot. Complex language syntax is a burden on the user. Languages that didn't learn that lesson include C++, Algol 68, PL/I. In the Libreswan project, Pluto includes a simple hand-rolled lexer and parser for a kind of config file. I wrote that. After I left the project, someone added a lex + yacc parser for a different kind of config file. When I rejoined the project, the lex + yacc code didn't work right and nobody knew how to fix it. I've kicked the worst problems out of it. But I think I could actually reduce the lines of code and bugs by just rewriting them to use the lexer that is already in pluto and with minimal recursive descent parser. LALR parsing is a very interesting discipline. I'm glad I understand it (although it is a bit foggy after all these years). I'm not sure that it is useful for most programmers. Lexing and parsing are things programmers do all the time, informally. Even if they don't know that's what they are doing. Understanding the problems formally can only be a help. I learned this stuff from books, people, and hacking. The first book to really present this stuff with a practical parser generator at the core was McKeeman, Horning, and Wortman: A Compiler Generator. (I took courses from the latter two and met McKeeman.) LaLonde's LALR parser generator slotted into that framework and was much better than the parser generator that they used. (Kildall's PL/M was clearly created with and inspired by this work. He then used it to create CP/M. Which was copied to create MSDOS.) The field was relatively young and approachable then. Compiler stuff has gotten way more developed and harder to approach. As a result amateurs jump in without the full background and make long-understood mistakes. I don't see a good alternative: there is too much to learn before doing useful work. The hardest part of compiler writing is code generation and optimization (the "backend"). For many "languages" (like libreswan control files) there is no need for code generation. Recent language implementations often use the LLVM backend. There is a concept of DSL (Domain Specific Languages): small, special purpose languages. Perhaps that's what you are trying to create. Googling might turn up light-weight tools to help with that.

ac

6:02 a.m.

On Sun, 16 Sep 2018 01:40:23 -0400 (EDT) "D. Hugh Redelmeier via talk" <talk@gtalug.org> wrote:

...

| From: William Park via talk <talk@gtalug.org> | Every now and then, I come across situation where an ideal solution | would have been a custom parser/interpreter, either to parse data format | or to write my own sets of commands. In each case, I don't know enough | to write it (I'm not Comp Sci) or don't have time. So, I end up | resorting to shell, awk, python, etc. | Now, I want to take some course on Lex/Yacc, finally! | What online course do you recommend?

I've never used online courses. Certainly not about this stuff. So I cannot answer your question directly.

ditto/dictus

...

What is it that you hope to learn? - theory? - practical use of these specific tools? - for what problems are these tools a good choice? What languages are you comfortable using to program? I ask because many languages have provided tools or libraries that might fully or partially address your needs.

imho, this is the crux - define the requirement properly and then evaluate what tool(s) needs using. personally I have found that it is sometimes more efficient to learn a new language if the advantage(s) such as libraries, existing code(base) and multiple other factors enables efficient and early production ready deliverable(s) I try to avoid feelings, politics and opinions and focus on the Joe Friday (Dragnet) of it all - which is how (and why) I ended up learning Pascal even though C would have been many times better, for me, to use at the time :)

...

TL;DR: from here on is a long discussion of my experiences with tools like these.

the below is cool and detailed, if not somewhat jaded in places :)

...

I don't have the habit of using lex. I write my own lexers in C. Not recommended: C is an unforgiving and unhelpful language. But I've done that for 40+ years. (And mostly in assembly languages for the decade before that.) I can do it in my sleep.

I think lexing is pretty easy. But maybe that's because I've already thought about most of the choices that need to be made. Perhaps lex provides a less experienced programmers a framework that guides some choices.

I don't have the habit of using yacc. yacc is a parser generator for LALR grammars. The construction of LALR parsers was first described in DeRemer's PhD thesis. But the first implementation was by Wilf LaLonde at U of T. For some reason that I don't remember, LaLonde moved for a bit to Waterloo and I (an undergrad) found his generator on the Waterloo computer (but no documentation). I played with it, reverse engineered how the generated tables worked, and started using it. So I was one of the first to use such a gizmo.

It was intoxicating. I invented more and more complicated language constructs because the complexity cost me almost nothing. My particular project was to design a high-level assembler for the PDP-11: think PL/360 (Niclaus Wirth's high-level assembler for the IBM System/360).

I now think that this is a mistake. Parsing is not the hardest problem in writing compilers. A hand-rolled recursive descent parser is fairly easy and allows for clearer diagnostics (diagnostics are important!). But I did learn a lot.

Complex language syntax is a burden on the user. Languages that didn't learn that lesson include C++, Algol 68, PL/I.

In the Libreswan project, Pluto includes a simple hand-rolled lexer and parser for a kind of config file. I wrote that. After I left the project, someone added a lex + yacc parser for a different kind of config file. When I rejoined the project, the lex + yacc code didn't work right and nobody knew how to fix it. I've kicked the worst problems out of it. But I think I could actually reduce the lines of code and bugs by just rewriting them to use the lexer that is already in pluto and with minimal recursive descent parser.

LALR parsing is a very interesting discipline. I'm glad I understand it (although it is a bit foggy after all these years). I'm not sure that it is useful for most programmers.

Lexing and parsing are things programmers do all the time, informally. Even if they don't know that's what they are doing. Understanding the problems formally can only be a help.

I learned this stuff from books, people, and hacking. The first book to really present this stuff with a practical parser generator at the core was McKeeman, Horning, and Wortman: A Compiler Generator. (I took courses from the latter two and met McKeeman.) LaLonde's LALR parser generator slotted into that framework and was much better than the parser generator that they used. (Kildall's PL/M was clearly created with and inspired by this work. He then used it to create CP/M. Which was copied to create MSDOS.)

The field was relatively young and approachable then. Compiler stuff has gotten way more developed and harder to approach. As a result amateurs jump in without the full background and make long-understood mistakes. I don't see a good alternative: there is too much to learn before doing useful work.

The hardest part of compiler writing is code generation and optimization (the "backend"). For many "languages" (like libreswan control files) there is no need for code generation. Recent language implementations often use the LLVM backend.

There is a concept of DSL (Domain Specific Languages): small, special purpose languages. Perhaps that's what you are trying to create. Googling might turn up light-weight tools to help with that. --- Talk Mailing List talk@gtalug.org https://gtalug.org/mailman/listinfo/talk

D. Hugh Redelmeier

5:47 p.m.

Ater advising against YACC, I thought I should promote it a bit. YACC uses a formal declarative system for specifying a language grammar (Backus-Naur Form). This has a number of nice features: - BNF is very well described and extensively used in the literature - it was invented to describe the programming language Algol 60. That document is one of the classics of computer science and is still a must-read. Here's a copy: <https://www.masswerk.at/algol60/report.htm> - many bastardizations of BNF have been used. The real thing is better than most of its successors. - a BNF grammar is a context-free grammar (Chomsky's term. Yes, that Noam Chomsky) - if a grammar is ambiguous, YACC will tell you. Not at runtime but at table-building time. This is really really useful because it is very easy to inadvertently create an ambiguous grammar -- generally a Bad Thing. Informal recursive descent parsers never detect such problems. This feature is especially useful for those still learning about language design. - YACC has features to resolve ambiguities. They are short-cuts that cloud the issues and I think that they are a Bad Thing. - an LR(k) grammar (invented by Knuth before LALR) means that a deterministic Left to Right single-pass parser (i.e. one without any backtracking) can "recognize" the language with only a k-symbol look-ahead. LALR(k) is a subset of LR(k) for which it is known how to generate an efficient parser. In practical terms, k should be 1. - when given a choice between a declarative and a procedural model, always at least consider declarative. Declarative is much easier to reason about as the system gets even a little complicated. One learns a lot about language design by writing a BNF grammar and debugging it through YACC. lex is based on some theory (Chomsky Type 0 (Regular) languages) but is more ad hoc.

Matthew Gordon

17 Sep 17 Sep

12:54 p.m.

I'd also recommend against Yacc. As others have said, it's a great tool and very powerful but a recursive descent parser will do the job 99% of the time and will be much easier. Writing a good unambiguous grammar for Yacc can be tricky and is much more difficult to debug than a recursive descent parser. A lot of languages now have "parser combinator" libraries which make it even easier to write a recursive descent parser. I'd do a google search to see if there's one available for your programming language of choice. Unfortunately I don't have any tutorial recommendations I still have my compiler textbook from university (which is probably way more information than you need) so I haven't personally needed the sort of tutorial you're looking for. On the other hand (and no promises), I've been trying to start a blog so if you can't find a good tutorial on recursive descent parsing let me know and I *might* find the time to write one myself. Matt Gordon On Sun, Sep 16, 2018 at 1:47 PM D. Hugh Redelmeier via talk <talk@gtalug.org> wrote:

...

Ater advising against YACC, I thought I should promote it a bit.

YACC uses a formal declarative system for specifying a language grammar (Backus-Naur Form). This has a number of nice features:

- BNF is very well described and extensively used in the literature

- it was invented to describe the programming language Algol 60. That document is one of the classics of computer science and is still a must-read. Here's a copy: <https://www.masswerk.at/algol60/report.htm>

- many bastardizations of BNF have been used. The real thing is better than most of its successors.

- a BNF grammar is a context-free grammar (Chomsky's term. Yes, that Noam Chomsky)

- if a grammar is ambiguous, YACC will tell you. Not at runtime but at table-building time. This is really really useful because it is very easy to inadvertently create an ambiguous grammar -- generally a Bad Thing. Informal recursive descent parsers never detect such problems.

This feature is especially useful for those still learning about language design.

- YACC has features to resolve ambiguities. They are short-cuts that cloud the issues and I think that they are a Bad Thing.

- an LR(k) grammar (invented by Knuth before LALR) means that a deterministic Left to Right single-pass parser (i.e. one without any backtracking) can "recognize" the language with only a k-symbol look-ahead. LALR(k) is a subset of LR(k) for which it is known how to generate an efficient parser. In practical terms, k should be 1.

- when given a choice between a declarative and a procedural model, always at least consider declarative. Declarative is much easier to reason about as the system gets even a little complicated.

One learns a lot about language design by writing a BNF grammar and debugging it through YACC.

lex is based on some theory (Chomsky Type 0 (Regular) languages) but is more ad hoc. --- Talk Mailing List talk@gtalug.org https://gtalug.org/mailman/listinfo/talk

William Park

7:35 p.m.

On Mon, Sep 17, 2018 at 08:54:29AM -0400, Matthew Gordon via talk wrote:

...

I'd also recommend against Yacc. As others have said, it's a great tool and very powerful but a recursive descent parser will do the job 99% of the time and will be much easier. Writing a good unambiguous grammar for Yacc can be tricky and is much more difficult to debug than a recursive descent parser. A lot of languages now have "parser combinator" libraries which make it even easier to write a recursive descent parser. I'd do a google search to see if there's one available for your programming language of choice.

Language is C; environment is embedded board of consumer/business printers; and, the nature of work is QA, where I'm trying to introduce more automation. So, for C, what parser library would you use? I take it, there is difference between "parser generator" and "parse combinator". -- William Park <opengeometry@yahoo.ca>

Alvin Starr

8:08 p.m.

On 09/17/2018 03:35 PM, William Park via talk wrote:

...

...
I'd also recommend against Yacc. As others have said, it's a great tool and very powerful but a recursive descent parser will do the job 99% of the time and will be much easier. Writing a good unambiguous grammar for Yacc can be tricky and is much more difficult to debug than a recursive descent parser. A lot of languages now have "parser combinator" libraries which make it even easier to write a recursive descent parser. I'd do a google search to see if there's one available for your programming language of choice. Language is C; environment is embedded board of consumer/business

On Mon, Sep 17, 2018 at 08:54:29AM -0400, Matthew Gordon via talk wrote: printers; and, the nature of work is QA, where I'm trying to introduce more automation. So, for C, what parser library would you use? I take it, there is difference between "parser generator" and "parse combinator". Having worked in building and supporting compilers years ago I would argue the Yacc and Lex are a better way to build a parser because it forces you to think a bit about design. I tend to like tools that have a formal underpinning and work with a design over ad-hoc systems where the result is directly tied to the ability of the programmer to think ahead and avoid problems.

It sounds like your looking to try and build a domain-specific language and you may have more luck with finding an existing language that you can use. Building and maintaining a programming language can quickly become a full time job and is an amazingly thankless job. -- Alvin Starr || land: (905)513-7688 Netvel Inc. || Cell: (416)806-0133 alvin@netvel.net ||

Stewart Russell

10:05 p.m.

On Mon, Sep 17, 2018, 15:36 William Park via talk, <talk@gtalug.org> wrote:

...

Language is C; environment is embedded board of consumer/business printers

So is this embedded as in "limited memory" and printers like "horrid binary protocols"? This might not fit with that kind of parser. Stewart

...

Kevin Cozens

18 Sep 18 Sep

4:09 p.m.

On 2018-09-15 01:10 AM, William Park via talk wrote:

...

Now, I want to take some course on Lex/Yacc, finally! What online course do you recommend?

I have used Lex/Yacc in the past to create a custom "language" for an program installation system. I just read up on the programs as there was no "online" available at the time. I like working with parsers. I just don't get to do it very often any more. I see some people suggest avoiding Yacc/Lex if possible. I think they are useful tools. They may be particularly helpful to keep you agile while the "language" you need to parse is still in a state of flux. If you need a more compact (or faster?) parsing solution the Yacc/Lex method can guide the design of a custom parser written in your main programming language of choice. A quick search turned up many tutorials that are available on YouTube. I also found a Yacc crash course (part of a Ruby Hacking Guide), and The LEX & YACC Page. In case you were not aware there are two equivalent programs in the GNU suite of tools for Lex and Yacc. They are Flex and Bison, respectively. You can also look for information on those two programs. -- Cheers! Kevin. http://www.ve3syb.ca/ | "Nerds make the shiny things that https://www.patreon.com/KevinCozens | distract the mouth-breathers, and | that's why we're powerful" Owner of Elecraft K2 #2172 | #include <disclaimer/favourite> | --Chris Hardwick

2517

Age (days ago)

2520

Last active (days ago)

List overview

Download

8 comments

7 participants

participants (7)

ac
Alvin Starr
D. Hugh Redelmeier
Kevin Cozens
Matthew Gordon
Stewart Russell
William Park

Online Course for Lex/Yacc?

tags

participants (7)