Front End and Compilation Principle-Write a JS Interpreter with JS

图片描述

When it comes to compiling principles, the impression is often confined to those dull courses and obscure concepts in undergraduate education. As a front-end developer, the compilation principle seems to be far away from us, and its understanding is probably limited to “abstract syntax tree (AST)”. But this is only the beginning. The use of compilation principles can even enable us to directly write an interpreter that can run JS code using JS.

Project address:https://github.com/jrainlau/c …

Online experience:https://codepen.io/jrainlau/p …

First, why write JS interpreter with JS

Students who have come into contact with small program development should know that the environment in which small programs run is forbidden.new Function,evalSuch as the use of methods, resulting in our inability to directly execute dynamic code in the form of strings. In addition, many platforms also restrict the methods of executing dynamic code provided by JS, so we have no choice? In this case, we can use JS to write a parser and let JS run itself.

Before we begin, let’s briefly review some concepts of compilation principles.

Second, what is a compiler

When it comes to compiling principles, compilers are definitely indispensable. In short, when a piece of code passes through the stages of lexical analysis and syntax analysis of the compiler, a tree-like “abstract syntax tree (AST)” will be generated, and each node of the syntax tree corresponds to a fragment with different meanings in the code.

For example, there is such a code:

const a = 1
 console.log(a)

After being processed by the compiler, its AST looks like this:

{
 "type": "Program",
 "start": 0,
 "end": 26,
 "body": [
 {
 "type": "VariableDeclaration",
 "start": 0,
 "end": 11,
 "declarations": [
 {
 "type": "VariableDeclarator",
 "start": 6,
 "end": 11,
 "id": {
 "type": "Identifier",
 "start": 6,
 "end": 7,
 "name": "a"
 },
 "init": {
 "type": "Literal",
 "start": 10,
 "end": 11,
 "value": 1,
 "raw": "1"
 }
 }
 ],
 "kind": "const"
 },
 {
 "type": "ExpressionStatement",
 "start": 12,
 "end": 26,
 "expression": {
 "type": "CallExpression",
 "start": 12,
 "end": 26,
 "callee": {
 "type": "MemberExpression",
 "start": 12,
 "end": 23,
 "object": {
 "type": "Identifier",
 "start": 12,
 "end": 19,
 "name": "console"
 },
 "property": {
 "type": "Identifier",
 "start": 20,
 "end": 23,
 "name": "log"
 },
 "computed": false
 },
 "arguments": [
 {
 "type": "Identifier",
 "start": 24,
 "end": 25,
 "name": "a"
 }
 ]
 }
 }
 ],
 "sourceType": "module"
 }

Common JS compilers includebabylon,acornWait, interested students can visitAST explorerThis website is self-experienced.

As you can see, the compiled AST records in detail the types and starting positions of all semantic codes in the code. In addition to the root nodeProgramIn addition, the body contains two nodesVariableDeclarationAndExpressionStatementAnd these nodes contain different child nodes.

It is precisely because AST records the semantic information of the code in detail that Babel, Webpack, Sass, Less and other tools can process the code very intelligently.

Three, what is the interpreter

Just as translators can not only understand a foreign language, but also translate it into their mother tongue after artistic processing, people call the tool that can convert code into AST “compiler” and the tool that can translate AST into target language and run it “interpreter”.

In the course of compiling principles, we have considered such a problem: how to make the computer run arithmetic expressions1+2+3:

1 + 2 + 3

When the machine executes, it may be machine code like this:

1 PUSH 1
 2 PUSH 2
 3 ADD
 4 PUSH 3
 5 ADD

The program running this machine code is the interpreter.

In this article, we won’t come up with such complicated things as machine code, just use JS to interpret AST of JS code in its runtime environment. Because the interpreter is written in JS, we can boldly use JS’s own language features, such as this binding, new keyword, etc., without any additional processing, thus making the implementation of JS interpreter very simple.

After reviewing the basic concepts of compilation principles, we can proceed with the development.

Four, node traverser

By analyzing AST above, we can see that each node has a type attribute.type, different types of nodes need different processing methods, the program that processes these nodes is “node processor (nodeHandler)”

Define a node processor:

const nodeHandler = {
 Program () {},
 VariableDeclaration () {},
 ExpressionStatement () {},
 MemberExpression () {},
 CallExpression () {},
 Identifier () {}
 }

The specific implementation of the node processor will be discussed in detail later and will not be expanded here for the time being.

With the node processor, we need to traverse every node in AST and recursively call the node processor until the whole grammar book is processed.

Defines a node traverser (NodeIterator):

class NodeIterator {
 constructor (node) {
 this.node = node
 this.nodeHandler = nodeHandler
 }
 
 traverse (node) {
 //Find the corresponding function in the node processor according to the node type
 const _eval = this.nodeHandler[node.type]
 //If not found, report an error
 if (!  _eval) {
 throw new Error(`canjs: Unknown node type "${node.type}".`)
 }
 //Run processing function
 return _eval(node)
 }
 
 }

In theory, the node traverser can be designed in this way, but after careful deliberation, it is found that a very important thing-scope processing is missing.

Back to the node processorVariableDeclaration()Method, which is used to handle issues such asconst a = 1Such variables declare nodes. Suppose its code is as follows:

VariableDeclaration (node) {
 for (const declaration of node.declarations) {
 const { name } = declaration.id
 const value = declaration.init ?  traverse(declaration.init) : undefined
 //Here comes the problem. After getting the name and value of the variable, where can I save it?
 //   ...
 }
 },

The problem is that after processing the variable declaration node, the variable should be saved. According to JS language features, this variable should be stored in a scope. During the implementation of JS parser, this scope can be defined as ascopeObject.

Rewrite the node traverser and add one for itscopeObject

class NodeIterator {
 constructor (node, scope = {}) {
 this.node = node
 this.scope = scope
 this.nodeHandler = nodeHandler
 }
 
 traverse (node, options = {}) {
 const scope = options.scope || this.scope
 const nodeIterator = new NodeIterator(node, scope)
 const _eval = this.nodeHandler[node.type]
 if (!  _eval) {
 throw new Error(`canjs: Unknown node type "${node.type}".`)
 }
 return _eval(nodeIterator)
 }
 
 createScope (blockType = 'block') {
 return new Scope(blockType, this.scope)
 }
 }

Then node processing functionVariableDeclaration()You can passscopeSaved variables:

VariableDeclaration (nodeIterator) {
 const kind = nodeIterator.node.kind
 for (const declaration of nodeIterator.node.declarations) {
 const { name } = declaration.id
 const value = declaration.init ?  nodeIterator.traverse(declaration.init) : undefined
 //Define variables in scope
 //If the current scope is block-level and the variable is defined by var, it is defined to the parent scope
 if (nodeIterator.scope.type === 'block' && kind === 'var') {
 nodeIterator.scope.parentScope.declare(name, value, kind)
 } else {
 nodeIterator.scope.declare(name, value, kind)
 }
 }
 },

Regarding scope processing, it can be said to be the most difficult part of the whole JS interpreter. Next, we will conduct an in-depth analysis of scope processing.

V. Scope Treatment

Considering such a situation:

const a = 1
 {
 const b = 2
 console.log(a)
 }
 console.log(b)

The result of the operation must be able to print outa, and then report an error:Uncaught ReferenceError: b is not defined

This code is about scope. A block-level scope or a function scope can read the variables in its parent scope, but not vice versa. Therefore, we cannot simply define an empty object for the scope, but need to deal with it specially.

Defines a scope base classScope

class Scope {
 constructor (type, parentScope) {
 //Scope Type, Distinguishing function Scope Function from block Level Scope Block
 this.type = type
 //parent scope
 this.parentScope = parentScope
 //Global Scope
 this.globalDeclaration = standardMap
 //Variable space of current scope
 this.declaration = Object.create(null)
 }
 
 /*
 * get/set method is used to get/set the variable value of the corresponding name in the current scope
 According to JS syntax rules, it is preferred to search from the current scope. If it cannot be found, it is found in the parent scope and then in the global scope.
 If there is none, report the error
 */
 get (name) {
 if (this.declaration[name]) {
 return this.declaration[name]
 } else if (this.parentScope) {
 return this.parentScope.get(name)
 } else if (this.globalDeclaration[name]) {
 return this.globalDeclaration[name]
 }
 throw new ReferenceError(`${name} is not defined`)
 }
 
 set (name, value) {
 if (this.declaration[name]) {
 this.declaration[name] = value
 } else if (this.parentScope[name]) {
 this.parentScope.set(name, value)
 } else {
 throw new ReferenceError(`${name} is not defined`)
 }
 }
 
 /**
 * according to the kind of variable call different variable definition method
 */
 declare (name, value, kind = 'var') {
 if (kind === 'var') {
 return this.varDeclare(name, value)
 } else if (kind === 'let') {
 return this.letDeclare(name, value)
 } else if (kind === 'const') {
 return this.constDeclare(name, value)
 } else {
 throw new Error(`canjs: Invalid Variable Declaration Kind of "${kind}"`)
 }
 }
 
 varDeclare (name, value) {
 let scope = this
 //If there is a parent scope of non-function type in the current scope, define the variable to the parent scope.
 while (scope.parentScope && scope.type !  == 'function') {
 scope = scope.parentScope
 }
 this.declaration[name] = new SimpleValue(value, 'var')
 return this.declaration[name]
 }
 
 letDeclare (name, value) {
 //Duplicate definition is not allowed
 if (this.declaration[name]) {
 throw new SyntaxError(`Identifier ${name} has already been declared`)
 }
 this.declaration[name] = new SimpleValue(value, 'let')
 return this.declaration[name]
 }
 
 constDeclare (name, value) {
 //Duplicate definition is not allowed
 if (this.declaration[name]) {
 throw new SyntaxError(`Identifier ${name} has already been declared`)
 }
 this.declaration[name] = new SimpleValue(value, 'const')
 return this.declaration[name]
 }
 }

Here we use a system calledsimpleValue()Function to define variable values, mainly used to process constants:

class SimpleValue {
 constructor (value, kind = '') {
 this.value = value
 this.kind = kind
 }
 
 set (value) {
 //It is forbidden to reassign const type variables.
 if (this.kind === 'const') {
 throw new TypeError('Assignment to constant variable')
 } else {
 this.value = value
 }
 }
 
 get () {
 return this.value
 }
 }

The key to the way to deal with the scope problem lies in the JS language itself looking for the characteristics of variables-giving priority to the current scope, followed by the parent scope, and finally the global scope. In turn, the function is processed at the nodeVariableDeclaration()If a block-level scope is encountered and the keyword isvar, you need to define this variable into the parent scope, which is what we often call “global variable pollution.”

JS standard library injection

Careful readers will find that in the definitionScopeBase class, its global scopeglobalScopeOne has been assignedstandardMapObject, this object is JS standard library.

Simply put, JS standard library is a series of methods and attributes that JS itself carries, such as common onessetTimeout,console.logWait. In order for the parser to perform these methods, we need to inject a standard library into it:

const standardMap = {
 console: new SimpleValue(console)
 }

This is equivalent to injecting into the global scope of the parserconsoleThis object can also be used directly.

VI. Node Processor

After processing the node traverser and scope processing, the node processor can be written. As the name implies, the node processor is specially used to process AST nodes, as mentioned repeatedly aboveVariableDeclaration()The method is one of them. Some key node processors will be explained below.

Before the node processor is developed, a tool is needed to judge which of the JS statementsreturn,break,continueKey words.

Keyword judgment toolSignal

Define aSignalBase class:

class Signal {
 constructor (type, value) {
 this.type = type
 this.value = value
 }
 
 static Return (value) {
 return new Signal('return', value)
 }
 
 static Break (label = null) {
 return new Signal('break', label)
 }
 
 static Continue (label) {
 return new Signal('continue', label)
 }
 
 static isReturn(signal) {
 return signal instanceof Signal && signal.type === 'return'
 }
 
 static isContinue(signal) {
 return signal instanceof Signal && signal.type === 'continue'
 }
 
 static isBreak(signal) {
 return signal instanceof Signal && signal.type === 'break'
 }
 
 static isSignal (signal) {
 return signal instanceof Signal
 }
 }

With it, you can judge and process the keywords in the statement, which will be of great use next.

1. Variable Definition Node Processor-VariableDeclaration()

One of the most commonly used node processors, responsible for registering variables to the correct scope.

VariableDeclaration (nodeIterator) {
 const kind = nodeIterator.node.kind
 for (const declaration of nodeIterator.node.declarations) {
 const { name } = declaration.id
 const value = declaration.init ?  nodeIterator.traverse(declaration.init) : undefined
 //Define variables in scope
 //If it is a block-level scope and the keyword is var, global pollution is required
 if (nodeIterator.scope.type === 'block' && kind === 'var') {
 nodeIterator.scope.parentScope.declare(name, value, kind)
 } else {
 nodeIterator.scope.declare(name, value, kind)
 }
 }
 },

2. Identifier Node Processor-Identifier()

The value specifically used to obtain the identifier from the scope.

Identifier (nodeIterator) {
 if (nodeIterator.node.name === 'undefined') {
 return undefined
 }
 return nodeIterator.scope.get(nodeIterator.node.name).value
 },

3. Character Node Processor-Literal()

Returns the value of a character node.

Literal (nodeIterator) {
 return nodeIterator.node.value
 }

4. Expression Call Node Processor-CallExpression()

A processor used to process expression calling nodes, such as processingfunc(),console.log()Wait.

CallExpression (nodeIterator) {
 //Traverse callee to Get Function Body
 const func = nodeIterator.traverse(nodeIterator.node.callee)
 //Get Parameters
 const args = nodeIterator.node.arguments.map(arg => nodeIterator.traverse(arg))
 
 let value
 if (nodeIterator.node.callee.type === 'MemberExpression') {
 value = nodeIterator.traverse(nodeIterator.node.callee.object)
 }
 //Return the function running result
 return func.apply(value, args)
 },

5. Expression Node Processor-MemberExpression()

Distinguished from the “expression call node processor” above, expression nodes refer toperson.say,console.logThis kind of function expression.

MemberExpression (nodeIterator) {
 //Get an object, such as console
 const obj = nodeIterator.traverse(nodeIterator.node.object)
 //Method of obtaining object, such as log
 const name = nodeIterator.node.property.name
 //return an expression, such as console.log
 return obj[name]
 }

6, block level declaration node processor-BlockStatement()

A very common processor, specially used for processing block-level declaration nodes, such as functions, loops,try...catch ...The scene in the middle.

BlockStatement (nodeIterator) {
 //Define a block-level scope first
 let scope = nodeIterator.createScope('block')
 
 //Process each node within the block level node
 for (const node of nodeIterator.node.body) {
 if (node.type === 'VariableDeclaration' && node.kind === 'var') {
 for (const declaration of node.declarations) {
 scope.declare(declaration.id.name, declaration.init.value, node.kind)
 }
 } else if (node.type === 'FunctionDeclaration') {
 nodeIterator.traverse(node, { scope })
 }
 }
 
 //extract keywords (return, break, continue)
 for (const node of nodeIterator.node.body) {
 if (node.type === 'FunctionDeclaration') {
 continue
 }
 const signal = nodeIterator.traverse(node, { scope })
 if (Signal.isSignal(signal)) {
 return signal
 }
 }
 }

You can see there are two in this processor.for...ofCycle. The first is used to process statements within the block level, and the second is specifically used to identify keywords, such as those inside the loop bodybreak,continueOr inside a function bodyreturn.

7. Function Definition Node Processor-FunctionDeclaration()

Declare a variable with the same name as the function in action, and the value is the defined function:

FunctionDeclaration (nodeIterator) {
 const fn = NodeHandler.FunctionExpression(nodeIterator)
 nodeIterator.scope.varDeclare(nodeIterator.node.id.name, fn)
 return fn
 }

8. Function Expression Node Processor-FunctionExpression()

Used to define a function:

FunctionExpression (nodeIterator) {
 const node = nodeIterator.node
 /**
 * 1. To define a function, you need to define a function scope for it first, and it is allowed to inherit the parent scope.
 * 2. Register' this`, `arguments' and the variable space whose parameters are scoped
 * 3. Check return keyword
 * 4, define the function name and length
 */
 const fn = function () {
 const scope = nodeIterator.createScope('function')
 scope.constDeclare('this', this)
 scope.constDeclare('arguments', arguments)
 
 node.params.forEach((param, index) => {
 const name = param.name
 scope.varDeclare(name, arguments[index])
 })
 
 const signal = nodeIterator.traverse(node.body, { scope })
 if (Signal.isReturn(signal)) {
 return signal.value
 }
 }
 
 Object.defineProperties(fn, {
 name: { value: node.id ?  node.id.name : '' },
 length: { value: node.params.length }
 })
 
 return fn
 }

9. this Expression Processor-ThisExpression()

The processor directly uses the JS language’s own characteristics tothisThe keyword can be removed from the scope.

ThisExpression (nodeIterator) {
 const value = nodeIterator.scope.get('this')
 return value ?  value.value : null
 }

10. new Expression Processor-NewExpression()

AndthisThe expression is similar and directly follows the language features of JS. After obtaining functions and parameters, the expression is passed through thebindKeyword generates a constructor and returns.

NewExpression (nodeIterator) {
 const func = nodeIterator.traverse(nodeIterator.node.callee)
 const args = nodeIterator.node.arguments.map(arg => nodeIterator.traverse(arg))
 return new (func.bind(null, ...args))
 }

11. For Loop Node Processor-ForStatement()

The three parameters of the For loop correspond to theinit,test,updateAttribute, call the node processor to handle the three attributes respectively, and put them back into JS native for loop.

ForStatement (nodeIterator) {
 const node = nodeIterator.node
 let scope = nodeIterator.scope
 if (node.init && node.init.type === 'VariableDeclaration' && node.init.kind !  == 'var') {
 scope = nodeIterator.createScope('block')
 }
 
 for (
 node.init && nodeIterator.traverse(node.init, { scope });
 node.test ?   nodeIterator.traverse(node.test, { scope }) : true;
 node.update && nodeIterator.traverse(node.update, { scope })
 ) {
 const signal = nodeIterator.traverse(node.body, { scope })
 
 if (Signal.isBreak(signal)) {
 break
 } else if (Signal.isContinue(signal)) {
 continue
 } else if (Signal.isReturn(signal)) {
 return signal
 }
 }
 }

Similarly,for...in,whileAnddo...whileLoop is also a similar processing method and will not be repeated here.

12. If declaration node processor-IfStatemtnt()

Handle If statements, includingif,if...else,if...elseif...else.

IfStatement (nodeIterator) {
 if (nodeIterator.traverse(nodeIterator.node.test)) {
 return nodeIterator.traverse(nodeIterator.node.consequent)
 } else if (nodeIterator.node.alternate) {
 return nodeIterator.traverse(nodeIterator.node.alternate)
 }
 }

Similarly,switchStatements and ternary expressions are treated in a similar way.

The above lists several important node processors. There are still many nodes to be processed in es5, and the details can be accessed.This addressFind out.

Seven, define the call way

After all the above steps, the parser already has the ability to process es5 code. The next step is to assemble these loose contents and finally define a method convenient for users to call.

const { Parser } = require('acorn')
 const NodeIterator = require('./iterator')
 const Scope = require('./scope')
 
 class Canjs {
 constructor (code = '', extraDeclaration = {}) {
 this.code = code
 this.extraDeclaration = extraDeclaration
 this.ast = Parser.parse(code)
 this.nodeIterator = null
 this.init()
 }
 
 init () {
 //Define a global scope whose type is function scope
 const globalScope = new Scope('function')
 //Define global variables outside the standard library according to the input parameters
 Object.keys(this.extraDeclaration).forEach((key) => {
 globalScope.addDeclaration(key, this.extraDeclaration[key])
 })
 this.nodeIterator = new NodeIterator(null, globalScope)
 }
 
 run () {
 return this.nodeIterator.traverse(this.ast)
 }
 }

Here we have defined a system calledCanjsThe base class of, accepts JS code in string form, and can define variables outside the standard library. When runningrun()Method, you can get the running results.

VIII. Follow-up

At this point, the entire JS parser has been completed and can run ES5 code well (there may be bugs that haven’t been found). However, in the current implementation, all the running results are placed in a place similar to a sand box and cannot affect the outside world. If you want to take out the running results, there are two possible ways. The first is to pass in a global variable, to influence the global variable, and to bring out the results with it. The other is to let the parser support itexportGrammar, can putexportThe result of statement declaration is returned, and interested readers can study it by themselves.

Finally, this JS parser has already opened source on my Github, welcome to exchange ~

https://github.com/jrainlau/c …

Reference material

Write a Javascript parser from scratch

The WeChat applet has to force the code to be changed. The goose factory refuses to accept your call.

jkeylu/evil-eval