-
-
Notifications
You must be signed in to change notification settings - Fork 32.5k
Description
This issue is part of the larger epic of gh-104584. In PR gh-106393 I tried to implement branching, but it was premature. Here's a better design, following @markshannon's guidance.
We have the following jump instructions (not counting the instrumented versions):
Unconditional jumps:
- JUMP_BACKWARDJUMP_BACKWARD_NO_INTERRUPTJUMP_FORWARDTo pick up a draggable item, press the space bar. While dragging, use the arrow keys to move the item. Press space again to drop the item in its new position, or press escape to cancel.
Branches, a.k.a. conditional jumps:
POP_JUMP_IF_FALSE, POP_JUMP_IF_TRUE,
POP_JUMP_IF_NONE, POP_JUMP_IF_NOT_NONEFOR_ITER's specializations:
- FOR_ITER_LIST
- FOR_ITER_TUPLE
- FOR_ITER_RANGE
To pick up a draggable item, press the space bar. While dragging, use the arrow keys to move the item. Press space again to drop the item in its new position, or press escape to cancel.FOR_ITER_GEN
SEND
Add counters to to POP_JUMP_IF_{FALSE,TRUE} to determine likeliness
To pick up a draggable item, press the space bar. While dragging, use the arrow keys to move the item. Press space again to drop the item in its new position, or press escape to cancel.
The translation strategies could be as follows:
Unconditional jumps
JUMP_BACKWARD
- If this jumps to exactly the top of the current trace, emit a Tier 2 JUMP_TO_TOP uop, and stop projecting (i.e., exit the trace generation loop). The JUMP_TO_TOP uop implementation should include a CHECK_EVAL_BREAKER call.
- If this jumps anywhere else, emit a SAVE_IP uop with the destination of the jump, followed by an EXIT_TRACE uop, and stop projecting.
JUMP_BACKWARD_NO_INTERRUPT
- Since this is typically only used in special circumstances, just emit a SAVE_IP instruction with the destination and an EXIT_TRACE uop, and stop projecting.
- Alternatively, we could make CHECK_EVAL_BREAKER a separate UOP that is inserted for JUMP_BACKWARD but not for JUMP_BACKWARD_NO_INTERRUPT, and otherwise treat the two backward jumps the same.
JUMP_FORWARD
- Emit a SAVE_IP uop with the destination of the jump, and continue projecting from there (i.e. set
instr
to the destination of the jump).
Conditional jumps (branches)
POP_JUMP_IF_FALSE and friends
Consider the following Python code:
if cond:
block
rest
This translates roughly to the following Tier 1 bytecode (using B1, B2, ... to label Tier 1 instructions, and <cond>
, <block>
etc. to represent code blocks that evaluate or execute the corresponding Python fragments):
B1: <cond>
B2: POP_JUMP_IF_FALSE B4
B3: <block>
B4: <rest>
B5:
I propose the following translation into Tier 2 uops, assuming the branch is "unlikely":
SAVE_IP B1
<cond>
SAVE_IP B2
JUMP_IF_FALSE stub
POP_TOP
SAVE_IP B3
<block>
SAVE_IP B4
EXIT_TRACE
stub:
POP_TOP
SAVE_IP B4
EXIT_TRACE
Where JUMP_IF_FALSE inspects the top of stack but doesn't pop it, and has an argument that executes a jump in the Tier 2 uop instruction sequence.
If the branch is "likely", we do this instead:
SAVE_IP B1
<cond>
SAVE_IP B2
JUMP_IF_TRUE stub
POP_TOP
SAVE_IP B4
<rest>
SAVE_IP B5
EXIT_TRACE
stub:
POP_TOP
SAVE_IP B3
EXIT_TRACE
Note how in this case, <rest>
is projected as part of the trace, while <block>
is not, since the likely case is that we jump over <block>
to <rest>
.
For the other simple conditional jumps (POP_JUMP_IF_TRUE, POP_JUMP_IF_NONE, POP_JUMP_IF_NOT_NONE) we do the same: if the jump is unlikely, emit a JUMP_IF_XXX uop and a stub; if the jump is likely, emit the inverse JUMP_IF_NOT_XXX uop and a different stub, and continue projecting at the destination of the original jump bytecode.
I propose to have hand-written cases both in the superblock generator and in the Tier 2 interpreter for these, since the translations are too irregular to fit easily in the macro expansion data structure. The stub generation will require additional logic and data structures in translate_bytecode_to_trace()
to keep track of the stubs required so far, the available space for those, and the back-patching required to set the operands for the JUMP_IF_XXX uops.
FOR_ITER and (especially) its specializations
The common case for these is not to jump. The bytecode definitions are too complex to duplicate in hand-written Tier 2 uops. My proposal is to change these in bytecodes.c so that, instead of using the JUMPBY(n)
macro, they use JUMPBY_POP_DISPATCH(n)
, which in Tier 1 translates into just JUMPBY(n)
, but in Tier 2 translates into roughly
frame->prev_instr += (x);
PY_DECREF(stack_pointer[-1]);
stack_pointer -= 1;
goto exit;
thereby exiting the trace when the corresponding for-loop terminates.
I am assuming here that most loops have several iterations. I don't think it's worth special-casing the occasional for-loop that almost always immediately terminates.
SEND
Possibly we could treat this the same as FOR_ITER. But right now I propose to just punt here, and when we encounter it, stop projecting, as we do with any other unsupported bytecode instruction.
Linked PRs
- gh-106529: Support FOR_ITER specializations as uops #106542
- gh-106529: Support JUMP_BACKWARD #106543
- gh-106529: Implement POP_JUMP_IF_XXX uops #106551
- GH-106529: Define POP_JUMP_IF_NONE in terms of POP_JUMP_IF_TRUE #106599
- gh-106529: Silence compiler warning in jump target patching #106613
- gh-106529: Split FOR_ITER_RANGE into uops #106638
- gh-106529: Implement JUMP_FORWARD in uops (with test) #106651
- gh-106529: Split FOR_ITER_{LIST,TUPLE} into uops #106696
- gh-106529: Fix subtle Tier 2 edge case with list iterator #106756
- gh-106529: Generate uops for POP_JUMP_IF_[NOT_]NONE #106796
- gh-106529: Make FOR_ITER a viable uop #112134
- gh-106529: Cleanups split off gh-112134 #112214
Activity
gvanrossum commentedon Jul 7, 2023
Separate from all this is the design for the counters to be used to determine whether a particular POP_JUMP_IF_XXX is more likely to jump or not. @markshannon's suggestion is to add a 16 bit cache entry to those bytecode instructions, initialized with a pattern of alternating ones and zeros. Each time we execute the instruction we shift the value left by 1, shifting in a 1 for a jump taken or a 0 for a jump not taken. When the question is asked, "is this jump likely", we simply count the bits, and if it's mostly ones we assume it is, if it's mostly zeros we assume it isn't. If it's half-half, well, we have a tough choice.
UPDATE: See gh-109039.
gvanrossum commentedon Jul 9, 2023
Q. In Tier 2, should we have POP_JUMP_IF_XXX, which (like its Tier 1 counterpart) combines a pop and a branch, or should we emit separate POP_TOP instructions?
Emitting separate POP_TOP uops seems perhaps simpler, but it's less performant to interpret, since the explicit POP_TOP will call Py_DECREF() on the stack top, whereas POP_JUMP_IF_TRUE/FALSE won't need to do that at all (True an False are immortal), an POP_JUMP_IF_[NOT_]NONE can skip that in the None case.
I don't have much intuition yet about which will be easier for the optimizer to handle. It's easy to change.
This also reminds me of a question for @brandtbucher: assuming that at some point the copy-and-patch machinery will use a Tier 2 uop sequence as input, how would you want to handle hand-written uops like JUMP_IF_FALSE or SAVE_IP? (If the answer is long or complicated, we should take this to the faster-cpython/ideas tracker.)
gvanrossum commentedon Jul 9, 2023
Similarly, in his original guidance @markshannon suggests that
POP_JUMP_IF_NONE
be translated intoLOAD_CONST None; IS_OP; POP_JUMP_IF_TRUE
(and similar forPOP_JUMP_IF_NOT_NONE
). I think that if we decide to do that, it should be done in compile.c when generating Tier 1 bytecode, since in the optimizer it would be hard to ensure there is aco_consts
entry forNone
. In the short term it might be simpler to just add aditional uopsJUMP_IF_[NOT_]NONE
with hand-written implementations and translations. If we're doing it in compile.c I'd like to enroll @iritkatriel's help. But it definitely feels like a pessimization to replace one bytecode instruction with three (and the same for uops in the Tier 2 interpreter).UPDATE: Never mind, Mark did this in GH-106599 using a macro in bytecodes.c.
markshannon commentedon Jul 10, 2023
POP_JUMP_IF_[TRUE|FALSE]
seems not only faster thanJUMP_IF_[TRUE|FALSE]
, but more consistent with other operations. The VM is a stack machine; it is normal for an operation to consume its argument(s).The point of my original suggestions was that using additional micro-ops can help us to reduce the number of branch micro-ops. As you point out
None
might not be present inco_consts
, so we would want to translatePOP_JUMP_IF_NONE
asIS_NONE; POP_JUMP_IF_TRUE
.We can add as many micro-ops as we want. We just want to minimize the number of micro-ops that need special treatment, like branches, jumps and calls.
All micro-ops are hand written, so
SAVE_IP
would be handled like any other simple micro-op.mdboom commentedon Jul 10, 2023
How is this different from a counter, where you add for jump taken and subtract for jump not taken? I understand it records some recency information, but if that isn't used, isn't it better to get a more accurate number (with a memory larger than 16 entries?)
markshannon commentedon Jul 10, 2023
A counter doesn't tell you anything about the recent history.
A branch that goes one way a 50 times then the other way a 50 times is different to one that alternates.
I suspect it won't matter much in practice, though, as long we have some hint as to the bias of the branch.
An alternative is to count the length and direction of the last two runs, but that's more complex and slower.
gvanrossum commentedon Jul 10, 2023
There seems to be some terminological confusion here. I'v been thinking of uops as anything that the Tier 2 interpreter can interpret. Most of those (over a 100 of them) are generated by the generator. There's only a handful hand-written ones (i.e., explicitly written down in the switch in
_PyUopExecute
) -- the OGs (EXIT_TRACE
andSAVE_IP
) and now thePOP_JUMP_IF_XXX
crowd.But it doesn't matter, we seem to be in agreement.
gvanrossum commentedon Jul 10, 2023
To clarify, my concern about the few hand-written uops is that IIUC @brandbucher's copy-and-patch tooling uses as its input either bytecodes.c or one of its .c.h outputs in order to generate its templates. For the hand-written uops it will need a way to use the hand-written code as template source. Maybe we need to add them to bytecodes.c after all, marks as uops.
26 remaining items
pythongh-106529: Fix subtle Tier 2 edge case with list iterator (pyth…
gh-106529: Generate uops for POP_JUMP_IF_[NOT_]NONE (#106796)
gvanrossum commentedon Nov 15, 2023
The design for
POP_JUMP_IF_FALSE
etc. has changed again since gh-112045 -- there are now no longer any "branching" ops in Tier 2, everything is done through deoptimization.However, we have another case: unspecialized
FOR_ITER
. According to the stats this is now by far the most frequent untranslated opcode, so we would like to translate it to Tier 2.FOR_ITER
is alreadymacro(FOR_ITER) = _SPECIALIZE_FOR_ITER + _FOR_ITER
.Alas,
_FOR_ITER
(the non-specializing part of the macro) is not straightforward to translate -- it does a bunch of work (callingtp_iternext
) and if this returnsNULL
without setting an exception it jumps over over the correspondingEND_FOR
instruction. The best I can think of for Tier 2 is to make it deopt in that case to theEND_FOR
opcode (which will handle the stack pops). But this feels like a bit of a hack. @markshannon?gvanrossum commentedon Nov 16, 2023
That seems to work, except the deopt code must be duplicated (since we can't use
target
as the deopt destination) and we must deopt to the instruction past theEND_FOR
, since that instruction pops two stack items, but we only have one (the iterator). See gh-112134.gh-106529: Cleanups split off gh-112134 (#112214)
gh-106529: Make FOR_ITER a viable uop (#112134)
pythongh-106529: Cleanups split off pythongh-112134 (python#112214)
pythongh-106529: Make FOR_ITER a viable uop (python#112134)
pythongh-106529: Cleanups split off pythongh-112134 (python#112214)
pythongh-106529: Make FOR_ITER a viable uop (python#112134)