Skip to content

Commit 5ec0660

Browse files
committed
Split from assembly-cheat
0 parents  commit 5ec0660

21 files changed

+342
-0
lines changed

Makefile

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
.POSIX:
2+
3+
BIN_EXT ?= .bin
4+
IN_EXT ?= .asm
5+
OBJ_EXT ?= .o
6+
OUT_EXT ?= .hd
7+
8+
INS := $(wildcard *$(IN_EXT))
9+
OUTS := $(patsubst %$(IN_EXT),%$(OUT_EXT),$(INS))
10+
11+
.PHONY: all clean run
12+
.PRECIOUS: %$(BIN_EXT) %$(OBJ_EXT)
13+
14+
all: $(OUTS)
15+
16+
%$(OUT_EXT): %$(BIN_EXT)
17+
od -An -tx1 '$<' | tail -c+2 > '$@'
18+
19+
%$(BIN_EXT): %$(OBJ_EXT)
20+
objcopy -O binary --only-section=.text '$<' '$@'
21+
22+
%$(OBJ_EXT): %$(IN_EXT)
23+
nasm -f elf32 -o '$@' '$<'
24+
@# For raw 16 bit. Would need to remove the objcopy step.
25+
@#nasm -f bin -o '$@' '$<'
26+
27+
clean:
28+
rm -f *$(BIN_EXT) *$(OBJ_EXT) *$(OUT_EXT)
29+
30+
run: all
31+
tail -n+1 *$(OUT_EXT)

README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# x86 Instruction Encoding Tutorial
2+
3+
1. [Introduction](introduction.md)
4+
1. [Global structure](global-structure.md)
5+
1. [Intel manual format](intel-manual-format.md)
6+
1. Examples
7+
1. [nop](nop.md)
8+
1. [push ebp](push-ebp.md)
9+
1. [mov eax, 1](mov-eax-1.md)
10+
1. [mov eax, ebx](mov-eax-ebx.md)
11+
1. [Bibliography](bibliography.md)

add-eax-ebx.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# add eax, ebx
2+
3+
Output:
4+
5+
01 d8
6+
^^ ^^
7+
8+
1. Opcode
9+
1. ModR/M
10+
11+
Opcode bits:
12+
13+
0 0 0 0 0 0 0 1
14+
^^^^^^^^^^^ ^ ^
15+
1 2 3
16+
17+
1. This is an add.
18+
2. Add REG to R/M as represented on the ModR/M byte. Otherwise, other way around.
19+
3. 32-bit operands. Otherwise, 8-bit.
20+
21+
ModR/M bits:
22+
23+
1 1 0 1 1 0 0 0
24+
^^^ ^^^^^ ^^^^^
25+
1 2 3
26+
27+
1. MOD = 3: REG and R/M are registers.
28+
2. REG = 3: EBX
29+
3. REG = 0: EAX
30+
31+
So from the opcode, we move REG (EBX) into R/M (EAX).
32+
33+
Note that two encodings are possible on reg / reg operations: we could swap the before last bit to 1 and both registers.
34+
35+
Both possible encodings are documented on the instruction table:
36+
37+
01 /r ADD r/m32, r32
38+
03 /r ADD r32, r/m32
39+
40+
`/r` says that a MOdR/M follows the opcode, and that the 2 last bits describe it.

bibliography.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Bibliography
2+
3+
- Intel® 64 and IA-32 Architectures Software Developer’s Manua
4+
5+
- section 2.1: binary serialization
6+
- section 3.1: documentation format
7+
8+
- <http://www.c-jump.com/CIS77/CPU/x86/lecture.html>
9+
10+
- <http://www.codeproject.com/Articles/662301/x-Instruction-Encoding-Revealed-Bit-Twiddling-fo>
11+
12+
- <http://wiki.osdev.org/X86-64_Instruction_Encoding>
13+
14+
- <http://www.strchr.com/machine_code_redundancy>

global-structure.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Global structure
2+
3+
Legend: `X-Y: description`, where `X` is the minimum, and `Y` is the maximum number of bytes.
4+
5+
- 0-4: instruction prefixes
6+
- 1-4: opcode
7+
- 0-1: ModR/M
8+
- 0-1: SIB
9+
- 0-4: displacement
10+
- 0-4: immediate
11+
12+
The most interesting bytes to start learning are the opcode and ModR/M.
13+
14+
## Opcode
15+
16+
Says which instruction is being run.
17+
18+
Sometimes, this can be further decomposed into smaller parts which say what is the source of data. E.g. [push ebp](push-ebp.asm), documented in the manual as `+rd`.
19+
20+
## ModR/M
21+
22+
Says where data is being moved to. Bits:
23+
24+
0 1 2 3 4 5 6 7
25+
^^^ ^^^^^ ^^^^^
26+
1 2 3
27+
28+
1. MOD
29+
30+
Determines how the next fields are interpreted.
31+
32+
- 00: Indirect addressing mode.
33+
- 01: Same as 00 but a 8-bit displacement is added to the value before dereferencing.
34+
- 10: same as 01 but a 32-bit displacement is added to the value.
35+
- 11: Reg and R/M byte will each refer to a register.
36+
37+
2. REG
38+
39+
- 000 (0): EAX (AX if data size is 16 bits, AL if data size is 8 bits)
40+
- 001 (1): ECX/CX/CL
41+
- 010 (2): EDX/DX/DL
42+
- 011 (3): EBX/BX/BL
43+
- 100 (4): ESP/SP (AH if data size is defined as 8 bits)
44+
- 101 (5): EBP/BP (CH if data size is defined as 8 bits)
45+
- 110 (6): ESI/SI (DH if data size is defined as 8 bits)
46+
- 111 (7): EDI/DI (BH if data size is defined as 8 bits)
47+
48+
3. R/M
49+
50+
## Prefixes
51+
52+
### 66
53+
54+
If given while on 16 bit mode, treat the memory as 32 bit.
55+
56+
If given while on 32 bit mode, treat the memory as 16 bit.

intel-manual-format.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Intel manual format
2+
3+
How the Intel manual documents the instruction encodings.
4+
5+
- Opcode
6+
- Instruction
7+
- Op / En
8+
- 64-Bit Mode
9+
- Compat / Leg Mode
10+
- `CPUID` feature flag
11+
- Description
12+
13+
They are explained in section 3.1.
14+
15+
### Instruction
16+
17+
E.g.:
18+
19+
XCHG EAX, r32
20+
21+
Means: takes 2 arguments:
22+
23+
- `EAX`: TODO
24+
- `r32`: a 32-bit register
25+
26+
Other important values:
27+
28+
- `r/m32`: either a 32-bit register or RAM Memory
29+
- `imm32`: value directly encoded on memory
30+
31+
### Op/En
32+
33+
### Operand Encoding
34+
35+
Refers to an entry on the "Instruction Operand Encoding" table.
36+
37+
Every instruction has it's own "Instruction Operand Encoding" table.
38+
39+
TODO understand an operand encoding table, e.g. for `mov`.
40+
41+
### CPUID feature flag
42+
43+
Which version of CPU support the feature as reported by CPUID.
44+
45+
### Compat / Leg Mode
46+
47+
- valid
48+
- invalid: can be encoded, but generates an exception
49+
- N.E.: not encodable
50+
51+
### 64-bit mode
52+
53+
- V: Supported.
54+
- I: Not supported.
55+
- N.E.: instruction syntax is not encodable in 64-bit mode (it may represent part of a sequence of valid instructions in other modes).
56+
- N.P.: REX prefix does not affect the legacy instruction in 64-bit mode.
57+
- N.I.: opcode is treated as a new instruction in 64-bit mode.
58+
- N.S.: requires an address override prefix in 64-bit mode and is not supported. Using an address override prefix in 64-bit mode may result in model-specific execution behavior

introduction.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Introduction
2+
3+
Convert all assembly inputs `.asm` into decompiled hexdump `.hd`:
4+
5+
sudo apt-get install nasm
6+
make run
7+
8+
To learn, rotate quickly between:
9+
10+
- the examples
11+
- the general instruction organization
12+
- the Intel manual
13+
14+
Until your brain starts to absorb them.

mov-al-1.asm

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
mov al, 1

mov-ax-1.asm

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
mov ax, 1

mov-eax-1.asm

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
mov eax, 1

mov-eax-1.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# mov eax, 1
2+
3+
Basic mov immediate instruction.
4+
5+
Output:
6+
7+
b8 01 00 00 00
8+
^^ ^^^^^^^^^^^
9+
1 2
10+
11+
1. Opcode
12+
2. Immediate value: `1` in little endian
13+
14+
Opcode bits:
15+
16+
1 0 1 1 1 0 0 0
17+
^^^^^^^^^ ^^^^^
18+
1 2
19+
20+
1. What to do.
21+
2. Where to move to. `000` is `eax`.
22+
23+
Intel documentation says:
24+
25+
- Opcode: `B8 + rd id`.
26+
27+
`+rd` says that the 3 bits at the end are the destination register.
28+
29+
`id` says that a double word immediate follows.
30+
31+
- Op/En: `OI`.
32+
33+
The "Instruction Operand Encoding" table for `mov` and `OI` says:
34+
35+
Operand 1: `opcode + rd (w)`
36+
Operand 2: `imm8/16/32/64`

mov-eax-ebx.asm

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
mov eax, ebx

mov-eax-ebx.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# mov eax, ebx
2+
3+
Output:
4+
5+
89 d8
6+
^^ ^^
7+
1 2
8+
9+
1. Opcode
10+
1. ModR/M
11+
12+
Opcode bits:
13+
14+
1 0 0 0 1 0 0 1
15+
^^^^^^^^^^^ ^ ^
16+
1 2 3
17+
18+
1. This is a `mov`.
19+
2. Move REG to R/M as represented on the ModR/M byte. Otherwise, other way around.
20+
3. 32-bit operands. Otherwise, 8-bit.
21+
22+
ModR/M bits:
23+
24+
1 1 0 1 1 0 0 0
25+
^^^ ^^^^^ ^^^^^
26+
1 2 3
27+
28+
1. MOD = 3: REG and R/M are registers.
29+
2. REG = 3: EBX
30+
3. REG = 0: EAX
31+
32+
So from the opcode, we move REG (EBX) into R/M (EAX).
33+
34+
Note that two encodings are possible on reg / reg operations: we could swap the before last bit to 1 and both registers.
35+
36+
Both possible encodings are documented on the instruction table:
37+
38+
01 /r MOV r/m32, r32
39+
03 /r MOV r32, r/m32
40+
41+
`/r` says that a MOdR/M follows the opcode, and that the 2 last bits describe it.

mov-eax-x-val.asm

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
; mov value at address
2+
mov eax, [x]
3+
x:
4+
db0 db 0xFF

mov-eax-x.asm

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
; mov address
2+
mov eax, x
3+
x:
4+
db0 db 0xFF

mov-ebx-1.asm

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
; See how ebx is encoded.
2+
mov ebx, 1

mov-ecx-1.asm

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
; See how ecx is encoded.
2+
mov ecx, 1

nop.asm

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
nop

nop.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# nop
2+
3+
`0x90` is the simple form.
4+
5+
But also has other multi-byte forms that can be used for alignment.

push-ebp.asm

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
push ebp

push-ebp.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# push ebp
2+
3+
Output:
4+
5+
55
6+
7+
Which is a single opcode.
8+
9+
The opcode can be further decomposed into the following bits:
10+
11+
0 1 0 1 0 1 0 1
12+
^^^^^^^^^ ^^^^^
13+
1 2
14+
15+
1. It is a push `push`.
16+
2. From where we will push. `101` is ebp.
17+
18+
This is documented as: opcode == `50+rd` in the Intel manual. The `+rd` part says that the 3 last bits indicate where to push from.

0 commit comments

Comments
 (0)