21: Building a compiler. Part 2 - JSToElm
All Episodes

21: Building a compiler. Part 2

In full swing building our ~~compiler~~ interpreter. We've got our lexer working nicely. Now for the meaty part of it. The process of taking all the source code that is now tokenized, and outputting into a data structure is the job of the parser. This output structure is called a 'syntax tree' or 'abstract syntax tree', did you checkout last week's pick

Parsing

Technical details of Meow’s parser

  • Two main ways to tackle parsing

    1. top-down
    2. building the root node of the AST and then working downward and out
    3. When the parser starts constructing the parse tree from the start symbol and then tries to transform the start symbol to the input, it is called top-down parsing.
    4. bottom-up
    5. goes the other way. lol.
    6. As the name suggests, bottom-up parsing starts with the input symbols and tries to construct the parse tree up to the start symbol.
    7. Within this are a few variations.
  • Meow will be a recursive descent parser

    • top down operator precedence,
    • called the “Pratt parser”, after Vaughan Pratt

Parse Generator

  • Dude, I know, well I didn’t know, but now I vaguely know.
  • Sort of defeats the purpose of learning to just use the off the self parse generator.

The actual parsing…

  • starting with ‘let’ valid Meow code
let x = 10;
let y = 15;

let add = fn(a,b){
  return a + b;
}
  • Statements and Expressions

  • Simply put, expressions produce values - statements don’t. SIMPLE

  • What a node for a variable binding in the let x = 5; would be ?

Recursive Descent Parser

  • Parsing token and tokenPeek to see what the next token is to invoke the right parsing function

  • Then calling parsing function based on the token type.

  • parseLetStatement or parseIdentifier

  • As we start to parse we are checking tokens, and peeking at the next token, but there are troubles!

    • prefix operators !true
    • infix operators 5 + 5
    • comparison operators x == y
    • call expressions add(2,3) or add(add(2,3), 12)
    • oh noes, identifiers are expressions too! bar * foo / foobar
    • first class functions & function literals
  • But it gets tricky when you get to expressions as you can see.

  • You have to write separate functions for each level of precedence (JavaScript has 17 of them, for example),

  • Pratt to the rescue.

    • BNF grammars and their various offspring.

    • He’s talking about parsing generators. where we feed them some formal description, and they output the parser for you.
    • technique is simple to understand, trivial to implement, easy to use, extremely efficient, and very flexible

    • Douglas Crockford, “Another explanation is that his technique is most effective when used in a dynamic, functional programming language.”
    • Douglas Crockford, “JavaScript is an ideal language for exploiting Pratt’s technique”

Parsing Expressions

  • a bit more difficult bc a let can only appear once at the beginning of statement. HOWEVER in an expression, tokens of the same type, can appear in multiple positions!!!!!

  • association of parsing functions called “semantic code” with token types.

  • whenever these token types are encountered the parsing functions are called

  • each token type can have up to two parsing functions associated with it, depending if the token is in prefix or infix

  • so we have two parsing functions

    1. prefixParseFn - ‘nuds’ or ‘null denotations’
    2. infixParseFn(ast.Expression) - ‘leds’ or ‘left denotations’
  • Both return an ast.Expression, but notice that infixParseFn takes an argument, prefixFarseFn does not.

    • the argument is to the left of the infix operator
  • parsingExpression we use an enum iota, so that we can assign precedence for infix’s like >= + * !X

    • Automate enum values with: iota → A numeric universal counter starting at 0 → Used only with constant declarations
  • Executing Pratt’s Parser

    • don’t represent every operator / operand, but rather nest the nodes correctly
    • treat the expression as a whole and then loop over it from left to right, using a set of conditionals if the next token is a not SEMICOLON && the precedence of the the following token is greater than the precedence of the current token.
    • higher precedence is to be deeper in the AST.
    • right-binding power
    • left-binding power
    • these are the parts of precedence < p.peekPrecedence where right-binding power is precedence and left-binding is peekPrecedence

Abstract Syntax Tree

What we’re shooting for, more or less. AST

Picks

  • Interview with Simon Peyton-Jones

    • Simon Peyton-Jones (Microsoft Research Cambridge) researches the implementations and applications of functional programming languages. He was heavily involved in the design of the Haskell programming language and the development of the Glasgow Haskell Compiler (GHC). We talk about seeing functional programming go from intellectual revolution to practical reality and the importance of investing in programming education.
  • The Good War

    • How America’s infatuation with World War II has eroded our conscience by Mike Dawson and Chris Hayes

Resources

Follow

Published 18 Jan 2018

A show about learning Elm, Functional Programing, and generally leveling up as a JS developer.
JavaScript To Elm on Twitter