Skip to content

Commit 2ce2883

Browse files
docs: make README more approachable (python#41)
* docs: make README more approachable * Add mermaid diagram, add evaluation Co-Authored-By: Jules <[email protected]> --------- Co-authored-by: Jules <[email protected]>
1 parent 426daf1 commit 2ce2883

File tree

2 files changed

+112
-28
lines changed

2 files changed

+112
-28
lines changed

README.md

Lines changed: 112 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -2,33 +2,121 @@
22

33
A WIP Lazy Basic Block Versioning + (eventually) Copy and Patch JIT Interpreter for CPython.
44

5-
Python is a widely-used programming language. CPython is its reference implementation. Due to Python’s dynamic type semantics, CPython is generally unable to execute Python programs as fast as it potentially could with static type semantics.
5+
## The case for our project
66

7-
Last semester, while taking CS4215, we made progress on implementing a technique for removing type checks and other overheads associated with dynamic languages known as [Lazy Basic Block Versioning (LBBV)](https://arxiv.org/abs/1411.0352) in CPython. This work will be referred to as pyLBBV. More details can be found in our [technical report](https://github.com/pylbbv/pylbbv/blob/pylbbv/report/CPython_Tier_2_LBBV_Report_For_Repo.pdf). For an introduction to pyLBBV, refer to our [presentation](https://docs.google.com/presentation/d/e/2PACX-1vQ9eUaAdAgU0uFbEkyBbptcLZ4dpdRP-Smg1V499eogiwlWa61EMYVZfNEXg0xNaQvlmdNIn_07HItn/pub?start=false&loop=false&delayms=60000).
7+
Python is a widely-used programming language. As a Python user, I want CPython code to execute quicker.
8+
9+
CPython is Python's reference implementation. Due to Python’s dynamic type semantics, CPython is generally unable to execute Python programs as fast as it potentially could with static type semantics.
10+
11+
## The solution
12+
13+
We fork CPython to implement and experiment with features that will make it faster. Our fork is called pyLBBVAndPatch.
14+
15+
### Features
16+
17+
This section will be laden with some CS programming language terminology. We will do our best to keep that to a minimum and explain our features as simply as possible.
18+
19+
#### Pre-Orbital
20+
21+
Before Orbital, while taking CS4215 in the previous semester, we have already achieved the following:
22+
23+
- Lazy basic block versioning interpreter
24+
- Basic type propagation
25+
- Type check removal
26+
- Basic tests
27+
- Comprehensive documentation
28+
29+
In short, [lazy basic block versioning](https://arxiv.org/abs/1411.0352) is a technique for collecting type information of executing code. A *basic block* is a "is a straight-line code sequence with no branches in except to the entry and no branches out except at the exit" ([retrieved from Wikipedia, 21/6/2023](https://en.wikipedia.org/wiki/Basic_block)). The code you write usually consists of multiple basic blocks. Lazy basic block versioning means we only generate code at runtime, block by block, as we observe the types. This allows us to collect runtime type information at basic block boundaries and optimize our basic blocks we generate them.
30+
31+
Type propagation means we can take the type information gathered from a single basic block, and propagate them down to the next basic block. Thus over time we can accumulate more and more type information.
32+
33+
Type check removal means removing type checks in dynamic Python. E.g. if you have ``a + b``, the fast path in Python has to check that these are `int` or `str` or `float`, then if all those fail, rely on a generic `+` function. These type checks incur overhead at runtime. With type information, if we know the types, that we don't need any type checks at all! This means we can eliminate type checks.
34+
35+
We had a rudimentary test script and Doxygen documentation for all our functions to follow common SWE practices.
36+
37+
#### Orbital
838

939
This Orbital, we intend to refine pyLBBV. These include but are not limited to:
10-
- General refactoring
11-
- Bug fixing
12-
- Better unboxing + support unboxing of other types
13-
- More type specialised code
40+
- [X] General refactoring
41+
- [X] Bug fixing
42+
- [X] A more advanced type propagator.
43+
- [X] A more comprehensive test suite with Continuous Integration testing.
44+
- [ ] A copy and patch JIT compiler.
1445

15-
Furthermore, we intend to further pyLBBV by integrating a [Copy and Patch JIT](https://arxiv.org/abs/2011.13127) (using code written externally by Brandt Bucher) on top of the type specialisation PyLBBV provides. The culmination of these efforts will allow further optimisations to be implemented. We hope that this effort can allow Python to be as fast as a statically typed language. Our work here will be made publically available so that it will benefit CPython and its users, and we plan to collaborate closely with the developers of CPython in the course of this project.
46+
A JIT(Just-in-Time) compiler is just a program that generates native machine executable code at runtime. [Copy and Patch](https://arxiv.org/abs/2011.13127) is a new fast compilation technique developed rather recently. The general idea is that compilation normally requires multiple steps, thus making compilation slow (recall how many steps your SICP meta-circular evaluator needs to execute JS)! Copy and patch makes compilation faster by skipping all the intermediate steps, and just creating "templates" for
47+
the final code. These "templates" are called *stencils* and they contain *holes*, i.e. missing values. All you have to do for compilation now is to copy and template, and patch in the holes. Thus making it very fast!
48+
49+
The main copy and patch JIT compiler is writte by Brandt Bucher, and we plan on integrating his work with ours. However, as such a compiler is not designed for use with basic block versioning, we will be handwriting x64 assembly code to get things working!
50+
51+
Our work here will be made publically available so that it will benefit CPython and its users, and we plan to collaborate closely with the developers of CPython in the course of this project.
1652

1753
Due to Python being a huge language, pyLBBVAndPatch intends to support and optimise a subset of Python. Specifically pyLBBVAndPatch focuses on integer and float arithmetic. We believe this scope is sufficient as an exploration of the LBBV + Copy and Patch JIT approach for CPython.
1854

19-
# Project Plan
55+
##### General refactoring
56+
57+
We did a major refactor of our code generation machinery. This makes the code easier to reason about.
58+
2059

21-
- Fix bugs and refactor hot-patches in pyLBBV
22-
- Implement interprocedural type propagation
23-
- Implement typed object versioning
24-
- Implement unboxing of integers, floats and other primitive types
25-
- Implement Copy and Patch JIT (Just In Time) Compilation
60+
##### Bug fixing
2661

27-
## Immediate Goals
62+
- We managed to support recursive functions in Python!
63+
- We are now able to build ourselves using the standard Python build suite. This is a huge accomplishment because it requires supporting a lot of Python code.
64+
- We have fixed enough bugs that we can now run complex recursive Python functions like `help(1)`.
2865

29-
Refer to [the issues page](https://github.com/pylbbv/pylbbv/issues).
3066

31-
# Changelog over CS4215
67+
##### A more advanced type propagator
68+
69+
- We fixed bugs with how one type context (i.e snapshot) is deemed to be compatible with another type context.
70+
- We now support collecting negative type information (e.g. that a variable is not and `int` or not `float`). This allows for better type check elimination.
71+
72+
73+
##### SWE dev best practices and CI testing
74+
75+
- We have added both feature tests and regression tests to our test script in [tier2_test.py](./tier2_test.py).
76+
- We now have continous integration. We build our project and run tests using GitHub Actions for Windows 64-bit, on every pull request and commit to the repository!
77+
![image](./orbital/CI.png)
78+
- All PRs require review and an approval before merging is allowed. All tests in CI must also pass. This follows standard best practices.
79+
80+
## Architecture Diagram and Design
81+
82+
```mermaid
83+
sequenceDiagram
84+
autonumber
85+
participant CPython Compiler
86+
participant Tier 0
87+
participant Tier 1
88+
box rgba(66,120,99,0.1) Our Project
89+
participant Tier 2
90+
participant Type Propagator
91+
end
92+
CPython Compiler ->> Tier 0: Emits code for <br> Tier 0 to execute
93+
loop
94+
Tier 0 ->> Tier 1: Individual instructions <br>profile the data it <br> receives and overwrites <br> itself to a more efficient <br> instruction.
95+
Tier 1 ->> Tier 0: If optimisation is <br> invalid, de-optimise <br> back to Tier 0
96+
end
97+
Tier 1 ->> Tier 2: Code executed more <br> than 63 times and <br> Tier 1 instructions <br> present, triggers the <br> Tier 2 interpreter
98+
loop Until exit scope executed
99+
loop until Tier 2 encounters type-specialised tier 1 instruction
100+
Note over Tier 2: Tier 2 copies Tier 1 <br> instructions into a <br> buffer to be executed <br> according to runtime <br> type info
101+
Tier 2 ->> Type Propagator: Requests type propagator
102+
Type Propagator ->> Tier 2: Type propagator <br> updates runtime type <br> info based on <br>newly emitted code
103+
end
104+
Note over Tier 2: Emits a typeguard <br> and executes Tier 2 code <br> until typeguard is hit.
105+
Tier 2 ->> Type Propagator: Requests type propagator
106+
Type Propagator ->> Tier 2: Type propagator updates <br> runtime type info <br> based on branch taken
107+
Note over Tier 2: Emits type specialised <br> branch according to <br> runtime type info
108+
end
109+
```
110+
111+
## What's left for our project
112+
113+
- The Copy and Patch JIT compiler.
114+
115+
## Evaluation and benchmarks
116+
117+
We will run the [bm_nbody.py](./bm_nbody.py) script and the [bm_float_unboxed.py](./bm_float_unboxed.py) to gather results. For now we expect performance to have no improvement as we have yet to implement the copy and patch JIT compiler.
118+
119+
## Changelog over CS4215
32120

33121
* Refactor: Typeprop codegen by @JuliaPoo in https://github.com/pylbbv/pylbbv/pull/1
34122
* Refactored type propagation codegen to more explicitly align with the formalism in our [technical report (Appendix)](https://github.com/pylbbv/pylbbv/blob/pylbbv/report/CPython_Tier_2_LBBV_Report_For_Repo.pdf) and remove duplicated logic
@@ -38,35 +126,27 @@ Refer to [the issues page](https://github.com/pylbbv/pylbbv/issues).
38126
* Perf: Improved typeprop by switching overwrite -> set by @JuliaPoo in https://github.com/pylbbv/pylbbv/pull/6
39127
* Stricter type propagation reduces type information loss
40128

41-
# Build instructions
129+
## Build instructions
42130

43131
You should follow the official CPython build instructions for your platform.
44132
https://devguide.python.org/getting-started/setup-building/
45133

46-
We have one major difference - you must have a pre-existing Python installation.
47-
Preferrably Python 3.9 or higher. On MacOS/Unix systems, that Python installation
48-
*must* be located at `python3`.
49-
50-
The main reason for this limitation is that Python is used to bootstrap the compilation
51-
of Python. However, since our interpreter is unable to run a large part of the Python
52-
language, our interpreter cannot be used as a bootstrap Python.
53-
54134
During the build process, errors may be printed, and the build process may error. However,
55135
the final Python executable should still be generated.
56136

57-
# Where are files located? Where is documentation?
137+
## Where are files located? Where is documentation?
58138

59139
The majority of the changes and functionality are in `Python/tier2.c` where Doxygen documentation
60140
is written alongside the code, and in `Tools/cases_generator/` which contains the DSL implementation.
61141

62-
# Running tests
142+
## Running tests
63143

64144
We've written simple tests of the main functionalities.
65145
Unfortunately we did not have time to write comprehensive tests, and it doesn't seem worth it eitherways given the experimental nature of this project.
66146

67147
After building, run `python tier2_test.py` or `python.bat tier2_test.py` (on Windows) in the repository's root folder.
68148

69-
# Debugging output
149+
## Debugging output
70150

71151
In `tier2.c`, two flags can be set to print debug messages:
72152
```c
@@ -76,3 +156,7 @@ In `tier2.c`, two flags can be set to print debug messages:
76156
// Prints typeprop debug messages
77157
#define TYPEPROP_DEBUG 0
78158
```
159+
160+
## Addendum
161+
162+
More details can be found in our [technical report](https://github.com/pylbbv/pylbbv/blob/pylbbv/report/CPython_Tier_2_LBBV_Report_For_Repo.pdf). For an introduction to pyLBBV, refer to our [presentation](https://docs.google.com/presentation/d/e/2PACX-1vQ9eUaAdAgU0uFbEkyBbptcLZ4dpdRP-Smg1V499eogiwlWa61EMYVZfNEXg0xNaQvlmdNIn_07HItn/pub?start=false&loop=false&delayms=60000).

orbital/CI.png

53.4 KB
Loading

0 commit comments

Comments
 (0)