
| From: William Park via talk <talk@gtalug.org> | Every now and then, I come across situation where an ideal solution | would have been a custom parser/interpreter, either to parse data format | or to write my own sets of commands. In each case, I don't know enough | to write it (I'm not Comp Sci) or don't have time. So, I end up | resorting to shell, awk, python, etc. | | Now, I want to take some course on Lex/Yacc, finally! | What online course do you recommend? I've never used online courses. Certainly not about this stuff. So I cannot answer your question directly. What is it that you hope to learn? - theory? - practical use of these specific tools? - for what problems are these tools a good choice? What languages are you comfortable using to program? I ask because many languages have provided tools or libraries that might fully or partially address your needs. TL;DR: from here on is a long discussion of my experiences with tools like these. I don't have the habit of using lex. I write my own lexers in C. Not recommended: C is an unforgiving and unhelpful language. But I've done that for 40+ years. (And mostly in assembly languages for the decade before that.) I can do it in my sleep. I think lexing is pretty easy. But maybe that's because I've already thought about most of the choices that need to be made. Perhaps lex provides a less experienced programmers a framework that guides some choices. I don't have the habit of using yacc. yacc is a parser generator for LALR grammars. The construction of LALR parsers was first described in DeRemer's PhD thesis. But the first implementation was by Wilf LaLonde at U of T. For some reason that I don't remember, LaLonde moved for a bit to Waterloo and I (an undergrad) found his generator on the Waterloo computer (but no documentation). I played with it, reverse engineered how the generated tables worked, and started using it. So I was one of the first to use such a gizmo. It was intoxicating. I invented more and more complicated language constructs because the complexity cost me almost nothing. My particular project was to design a high-level assembler for the PDP-11: think PL/360 (Niclaus Wirth's high-level assembler for the IBM System/360). I now think that this is a mistake. Parsing is not the hardest problem in writing compilers. A hand-rolled recursive descent parser is fairly easy and allows for clearer diagnostics (diagnostics are important!). But I did learn a lot. Complex language syntax is a burden on the user. Languages that didn't learn that lesson include C++, Algol 68, PL/I. In the Libreswan project, Pluto includes a simple hand-rolled lexer and parser for a kind of config file. I wrote that. After I left the project, someone added a lex + yacc parser for a different kind of config file. When I rejoined the project, the lex + yacc code didn't work right and nobody knew how to fix it. I've kicked the worst problems out of it. But I think I could actually reduce the lines of code and bugs by just rewriting them to use the lexer that is already in pluto and with minimal recursive descent parser. LALR parsing is a very interesting discipline. I'm glad I understand it (although it is a bit foggy after all these years). I'm not sure that it is useful for most programmers. Lexing and parsing are things programmers do all the time, informally. Even if they don't know that's what they are doing. Understanding the problems formally can only be a help. I learned this stuff from books, people, and hacking. The first book to really present this stuff with a practical parser generator at the core was McKeeman, Horning, and Wortman: A Compiler Generator. (I took courses from the latter two and met McKeeman.) LaLonde's LALR parser generator slotted into that framework and was much better than the parser generator that they used. (Kildall's PL/M was clearly created with and inspired by this work. He then used it to create CP/M. Which was copied to create MSDOS.) The field was relatively young and approachable then. Compiler stuff has gotten way more developed and harder to approach. As a result amateurs jump in without the full background and make long-understood mistakes. I don't see a good alternative: there is too much to learn before doing useful work. The hardest part of compiler writing is code generation and optimization (the "backend"). For many "languages" (like libreswan control files) there is no need for code generation. Recent language implementations often use the LLVM backend. There is a concept of DSL (Domain Specific Languages): small, special purpose languages. Perhaps that's what you are trying to create. Googling might turn up light-weight tools to help with that.