1. Introduction to TruffleUndefined01's blog

Truffle is a powerful language implement framework designed to simplify the creation of high-performance language implementations. It provides a set of tools and APIs that allow developers to focus on defining the semantics of their language, while Truffle handles the complexities of optimization and execution.

When we write a program, we are typically using a high-level programming language like Python, Java, or C. However, computers don’t understand these human-readable languages directly. They can only understand machine code, which consists of binary instructions specific to the hardware (CPU). So, before a program written in a high-level language can run, it needs to be translated into machine code.

There are two common ways to do this translation: interpreting or compiling.

Interpreter: An interpreter reads the source code of a program line by line, and directly executes it. It doesn’t produce machine code beforehand; instead, it works on the fly. So when you run a program with an interpreter, the interpreter processes the code, executes it, and outputs the result.
Compiler: A compiler, on the other hand, takes the entire source code and translates it into an intermediate form, often referred to as machine code or bytecode (which is also an intermediate form, not human-readable). This machine code can then be executed directly by the CPU without needing the original source code again.

Other ways to translate source code

There are also hybrid approaches, like Just-In-Time (JIT) compilation, which combines the benefits of both interpreting and compiling. A JIT compiler reads the source code, just like an interpreter, but instead of executing it directly, it compiles the code into machine code on the fly. This allows for optimizations that can improve performance significantly.

Another approach is transpilation, where the source code is translated into another high-level language instead of machine code, and then that language is compiled or interpreted using existing tools. For example, TypeScript is transpiled into JavaScript, which can then be executed by a JavaScript engine.

Now, both interpreters and compilers are themselves programs, and we can think of them as transforming inputs into outputs. We can represent it as a function:

A program P takes inputs (e.g., numbers, data) and produces outputs. So we can write $P : (i_{1}, i_{2}, \dots) \to (o_{1}, o_{2}, \dots)$ , where $i_{1}, i_{2}, \dots$ are inputs and $o_{1}, o_{2}, \dots$ are outputs.
An interpreter $I$ takes the source code of a program ( $S_{P}$ ) and the inputs, and produces the same outputs. So, we can write it like this $I : (S_{P}, i_{1}, i_{2}, \dots) \to (o_{1}, o_{2}, \dots)$ .
A compiler $C$ takes the source code and inputs, but instead of producing outputs directly, it produces an executable program $P$ defined above, which can then be run to produce the outputs. So we can write: $C : (S_{P}, i_{1}, i_{2}, ...) \to P$ .

There’s a clever optimization technique called partial evaluation that can help us make things more efficient. Here’s how it works:

Imagine you have a function $f : (x, y) \to z$ and you know the value of $x$ ahead of time. You can “partially evaluate” $f$ by fixing $x$ and creating a new function $f^{'} : (y) \to z$ that only depends on $y$ . This makes the function simpler and faster to execute because we’ve eliminated one of the variables.

Now, applying this idea to an interpreter.

If we have an interpreter $I : (S_{P}, i_{1}, i_{2}, \dots) \to (o_{1}, o_{2}, \dots)$ , we can “partially evaluate” it by fixing the source code $S_{P}$ . This gives us a new function $I^{'} : (i_{1}, i_{2}, \dots) \to (o_{1}, o_{2}, \dots)$ , which is effectively the same as running the program $P$ itself. The only difference is that this new function is highly specialized and can be optimized for the specific source code $S_{P}$ .

Futamura projections
The idea of partial evaluation is closely related to the Futamura projections, which are a set of three theorems that describe the relationship between interpreters, compilers, and partial evaluators.
flowchart LR
  SP[/Source of Program/] --> I[Interpreter]
  i[/Inputs/] --> I
  I --> o[/Outputs/]
The first projection states that we can partially evaluate an interpreter with a program and its inputs to get a specialized program, which produces a compiled version of the original program as we discussed above. This is what Truffle does under the hood.
flowchart LR
  I[/Interpreter/] --> PE
  SP[/Source of Program/] --> PE[Partial Evaluator]
  PE --> P[/Program/]

  i[/Inputs/] --> P'[Program]
  P' --> o[/Outputs/]
The second projection states that a partial evaluator is also a program and the interpreter can be seen as a constant. Hence, we can partially evaluate a partial evaluator $PE$ with an interpreter $I$ to get a specialized program $P E^{'} : S_{P} \to ((p i_{1}, p i_{2}, \dots) \to (p o_{1}, p o_{2}, \dots))$ , where $S_{P}$ is one of the inputs to $I$ and $p i_{k}$ , $p o_{k}$ are the inputs and outputs of the program. This is a compiler $C$ .
flowchart LR
  PE1[/Partial Evaluator/] --> PE[Partial Evaluator]
  I[/Interpreter/] --> PE
  PE --> C[/Compiler/]

  SP[/Source of Program/] --> C'[Compiler]
  C' --> P[/Program/]

  i[/Inputs/] --> P'[Program]
  P' --> o[/Outputs/]
The third projection states that if we partially evaluate a partial evaluator with a partial evaluator, we get a compiler compiler, which is a program that generates compilers from interpreters.
flowchart LR
  PE1[/Partial Evaluator/] --> PE[Partial Evaluator]
  PE2[/Partial Evaluator/] --> PE[Partial Evaluator]
  PE --> CC[/Compiler Compiler/]

  I[/Interpreter/] --> CC'[/Compiler Compiler/]
  CC' --> C[/Compiler/]

  SP[/Source of Program/] --> C'[Compiler]
  C' --> P[/Program/]

  i[/Inputs/] --> P'[Program]
  P' --> o[/Outputs/]
https://tomstu.art/compilers-for-free 1: Futamura, Y. (n.d.). Partial Evaluation of Computation Process—An Approach to a Compiler-Compiler. [2]: Jones-optimal partial evaluation by specialisation-safe normalization

Truffle with lots of optimizations happening automatically behind the scenes. For instance:

We can aggressively optimize $I^{'}$ by constant folding $S_{P}$ , since it’s a constant after partial evaluation.
We can unroll the read-dispatch-execute loop in the interpreter, since we know how the source code looks like.
We can apply other optimizations that are typically applied to compiled programs, since $I^{'}$ is a program itself.

Other optimizations
Truffle is mainly focused on dynamic languages, where the types of variables are not known at compile time. For example, in Python, a variable can hold an integer in one execution and a string in another.
def add(a, b):
    return a + b
 
add(1, 2)  # returns 3
add("hello", "world")  # returns "helloworld"
To be generic, the interpreter needs to check the types of variables at runtime and handle different cases accordingly. This can be extremely slow. However, Truffle can apply type specialization to optimize the code for the specific types that are encountered during execution. If almost all calls to add are with integers, Truffle can generate a specialized version of the function that only works with integers, making it faster, and then fall back to the generic version if it encounters a different type. Truffle can also use target splitting to generate multiple versions of the function optimized for different types, and call the appropriate one based on the actual types encountered at runtime.

Another issue with dynamic languages is dynamic object shapes. In statically typed languages like C and Java, the shape (the fields it has) and memory layout (where stores the fields) of an object is known at compile time, so the compiler can generate efficient code to access those fields from memory directly. In dynamic languages, objects can change their shape at runtime, which makes it hard to optimize memory access. Truffle can use inline caching to remember the shape of objects and generate specialized code that directly accesses the fields without needing to look up the shape every time.

For more details, see https://blog.ghaiklor.com/2016/04/05/optimizations-tricks-in-v8/.

After optimization, $I^{'}$ runs almost as fast as a compiled program, even though it’s derived from an interpreter. When it comes to the dynamic nature of languages like Python, JavaScript, or Ruby, it may even outperform a traditional compiler because it can specialize the code for the specific inputs and types that are only known at runtime. Benchmarks show that Truffle-based interpreters can be four times faster than traditional interpreters CPython and CRuby, three times faster than compiler JRuby and only 1.4x slower than compiled languages like Java or C.

Are we fast yet

This is the basic idea behind the Truffle framework: instead of manually implementing complex optimizations yourself, what you need to do is writing a simple interpreter, and Truffle will take care of applying all the performance improvements automatically.

Undefined01's blog

最近的笔记

探索

1. Introduction to Truffle

关系图谱